将不同数据帧中的列字典转换为数据帧：pyspark

newDFDict = { 'schoolName': school.INSTNM, 'type': school.CONTROL, 'avgCostAcademicYear': costs.COSTT4_A, 'avgCostProgramYear': costs.COSTT4_P, 'averageNetPricePublic': costs.NPT4_PUB, } { 'schoolName': Column<b'INSTNM'>, 'type': Column<b'CONTROL'>, 'avgCostAcademicYear': Column<b'COSTT4_A'>, 'avgCostProgramYear': Column<b'COSTT4_P'>, 'averageNetPricePublic': Column<b'NPT4_PUB'> }

newDFDict = { 'schoolName': school.select("INSTNM").collect(), 'type': school.select("CONTROL").collect(), 'avgCostAcademicYear': costs.select("COSTT4_A").collect(), 'avgCostProgramYear': costs.select("COSTT4_P").collect(), 'averageNetPricePublic': costs.select("NPT4_PUB").collect(), } newDF = sc.parallelize([newDFDict]).toDF() newDF.show() +---------------------+--------------------+--------------------+--------------------+--------------------+ |averageNetPricePublic| avgCostAcademicYear| avgCostProgramYear| schoolName| type| +---------------------+--------------------+--------------------+--------------------+--------------------+ | [[NULL], [NULL], ...|[[NULL], [NULL], ...|[[NULL], [NULL], ...|[[Community Colle...|[[1], [1], [1], [...| +---------------------+--------------------+--------------------+--------------------+--------------------+

1条回答

网友

1楼 · 发布于 2024-07-04 16:49:43

我建议有两种选择

选项1（构建字典的联合案例）：

你说过，你有>=10个具有公共列（例如“schoolName”、“type”“avgCostAcademicYear”、“avgCostProgramYear”、“avegagenetpricepublic”是公共列）的表（您要从这些表构建字典），然后您可以选择union或unionByName来形成单个合并表。查看数据

例如：

select 'schoolName','type' 'avgCostAcademicYear' ,'avgCostProgramYear' , 'averageNetPricePublic' from df1

 union  

select 'schoolName','type' 'avgCostAcademicYear' ,'avgCostProgramYear' , 'averageNetPricePublic' from df2
 ....
union
select 'schoolName','type' 'avgCostAcademicYear' ,'avgCostProgramYear' , 'averageNetPricePublic' from dfN

将为您提供词典的综合视图

选项2：（如果只有公共联接列）

如果您有一些常见的联接列，那么无论存在多少个表，也可以使用标准联接

对于psuedo sql示例：

select dictionary columns from table1,table2,table3,... tablen where join common columns in all tables (table1... tablen)

注：遗漏任何连接列都将导致笛卡尔积

选项1（构建字典的联合案例）：

选项2：（如果只有公共联接列）

相关问题更多 >

编程相关推荐

热门问题

热门文章