检测重复项并创建摘要行

2024-10-04 11:25:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个定期传入的CSV,看起来像这样(简化):

Published   Station         TypeFuel    Price
1/09/2015   BP Seaford      ULP         129.9
1/09/2015   BP Seaford      Diesel      133.9
1/09/2015   BP Seaford      Gas         156.9
1/09/2015   Shell Newhaven  ULP         139.9
1/09/2015   Shell Newhaven  Diesel      150.9
1/09/2015   7-Eleven Malaga ULP         135.9
1/09/2015   7-Eleven Malaga Diesel      155.9
2/10/2015   BP Seaford      ULP         138.9
2/10/2015   BP Seaford      Diesel      133.6
2/10/2015   BP Seaford      Gas         157.9

…隐藏了更多行。大约有200个电台,每天报道20-30天。

我需要把它总结成这样:

Published   Station         ULP     Diesel  Gas
1/09/2015   BP Seaford      129.9   133.9   156.9
1/09/2015   Shell Newhaven  139.9   150.9   
1/09/2015   7-Eleven Malaga 135.9   155.9   
2/09/2015   BP Seaford      138.9   133.6   157.9

刚开始学习熊猫教程,对Python还比较陌生,但我相信这两个教程可以帮助我完成这项任务。你知道吗

我认为我需要遍历CSV,在发布和Station match时,创建一个新行,将ULP/柴油/汽油价格转换成新的列。你知道吗


Tags: csv教程shellpricebpgasstationpublished
1条回答
网友
1楼 · 发布于 2024-10-04 11:25:03

您正在寻找^{},基于列'Published','Station'进行数据透视,从列TypeFuel中获取数据透视表中新列的值,并使用Price中的值作为其值。示例-

In [5]: df
Out[5]:
   Published          Station TypeFuel  Price
0  1/09/2015       BP Seaford      ULP  129.9
1  1/09/2015       BP Seaford   Diesel  133.9
2  1/09/2015       BP Seaford      Gas  156.9
3  1/09/2015   Shell Newhaven      ULP  139.9
4  1/09/2015   Shell Newhaven   Diesel  150.9
5  1/09/2015  7-Eleven Malaga      ULP  135.9
6  1/09/2015  7-Eleven Malaga   Diesel  155.9
7  2/10/2015       BP Seaford      ULP  138.9
8  2/10/2015       BP Seaford   Diesel  133.6
9  2/10/2015       BP Seaford      Gas  157.9

In [7]: df.pivot_table(index=['Published','Station'],columns=['TypeFuel'],values='Price')
Out[7]:
TypeFuel                   Diesel    Gas    ULP
Published Station
1/09/2015 7-Eleven Malaga   155.9    NaN  135.9
          BP Seaford        133.9  156.9  129.9
          Shell Newhaven    150.9    NaN  139.9
2/10/2015 BP Seaford        133.6  157.9  138.9

如果不希望PublishedStation成为索引,可以对pivot_table()的结果调用.reset_index()来重置索引。示例-

In [8]: df.pivot_table(index=['Published','Station'],columns=['TypeFuel'],values='Price').reset_index()
Out[8]:
TypeFuel  Published          Station  Diesel    Gas    ULP
0         1/09/2015  7-Eleven Malaga   155.9    NaN  135.9
1         1/09/2015       BP Seaford   133.9  156.9  129.9
2         1/09/2015   Shell Newhaven   150.9    NaN  139.9
3         2/10/2015       BP Seaford   133.6  157.9  138.9

相关问题 更多 >