pyspark中的pivot数据帧

1条回答

网友

1楼 · 发布于 2024-09-29 02:23:35

如果我正确理解了您需要的内容，您必须在sum（）中提供其他列。考虑下面的例子：

tst=sqlContext.createDataFrame([('2020-04-23',1,2,"india"),('2020-04-24',1,3,"india"),('2020-04-23',1,4,"china"),('2020-04-24',1,5,"china"),('2020-04-23',1,7,"germany"),('2020-04-24',1,9,"germany")],schema=('date','quantity','value','country'))
tst.show()
+     +    +  -+   -+
|      date|quantity|value|country|
+     +    +  -+   -+
|2020-04-23|       1|    2|  india|
|2020-04-24|       1|    3|  india|
|2020-04-23|       1|    4|  china|
|2020-04-24|       1|    5|  china|
|2020-04-23|       1|    7|germany|
|2020-04-24|       1|    9|germany|
+     +    +  -+   -+
df_pivot=tst.groupby('country').pivot('date').sum('quantity','value').show()
df_pivot.show()
+   -+            +          -+            +          -+
|country|2020-04-23_sum(quantity)|2020-04-23_sum(value)|2020-04-24_sum(quantity)|2020-04-24_sum(value)|
+   -+            +          -+            +          -+
|germany|                       1|                    7|                       1|                    9|
|  china|                       1|                    4|                       1|                    5|
|  india|                       1|                    2|                       1|                    3|
+   -+            +          -+            +          -+

如果您不喜欢有趣的列名，那么可以使用agg函数为数据透视列名定义自己的后缀

tst_res=tst.groupby('country').pivot('date').agg(F.sum('quantity').alias('sum_quantity'),F.sum('value').alias('sum_value'))
tst_res.show()
+   -+           -+          +           -+          +
|country|2020-04-23_sum_quantity|2020-04-23_sum_value|2020-04-24_sum_quantity|2020-04-24_sum_value|
+   -+           -+          +           -+          +
|germany|                      1|                   7|                      1|                   9|
|  china|                      1|                   4|                      1|                   5|
|  india|                      1|                   2|                      1|                   3|
+   -+           -+          +           -+          +

相关问题更多 >

编程相关推荐

热门问题

热门文章

pyspark中的pivot数据帧

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >