如何获取groupby total,然后计算数据帧列的百分比

2024-06-26 01:31:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我会很感激你的帮助,因为我不用再上网或玩游戏了。agg正在帮助我解决这个问题。 我有一个选举结果的数据框。我已按市政当局和PartyName分组,以获得市政当局中每个政党的总投票数,在我重置索引后,它看起来像以下片段:

         Municipality                           PartyName  TotalValidVotes
0  BUF - Buffalo City  AFRICAN CHRISTIAN DEMOCRATIC PARTY             2519
1  BUF - Buffalo City        AFRICAN INDEPENDENT CONGRESS            15600
2  BUF - Buffalo City           AFRICAN NATIONAL CONGRESS           268052
3  BUF - Buffalo City              CONGRESS OF THE PEOPLE             3913
4  BUF - Buffalo City                 DEMOCRATIC ALLIANCE           106790

我现在想计算一个城市的每个政党在总投票数中所占的百分比,但我不知道如何计算每个城市的投票总数,所以我可以计算百分比。 我觉得这在熊猫身上应该很容易做到,但我不知所措。提前谢谢


Tags: 数据city市政当局agg百分比玩游戏我会buffalo
2条回答

首先需要按两个变量(MunicipalityPartyName)分组,然后按结果聚合数据帧的第一个索引(level=0)分组,然后计算每个组(.apply(...))上的百分比

from io import StringIO
import pandas as pd

s = """Municipality    PartyName   TotalValidVotes
BUF - Buffalo City  AFRICAN CHRISTIAN DEMOCRATIC PARTY  2519
BUF - Buffalo City  AFRICAN INDEPENDENT CONGRESS    15600
BUF - Buffalo City  AFRICAN NATIONAL CONGRESS   268052
BUF - Buffalo City  CONGRESS OF THE PEOPLE  3913
BUF - Buffalo City  DEMOCRATIC ALLIANCE  106790
"""

df = pd.read_csv(StringIO(s), sep="\s\s+", engine="python")

df = (
    df.groupby(["Municipality", "PartyName"])
    .agg({"TotalValidVotes": "sum"})
    .groupby(level=0)
    .apply(lambda g: 100 * g / g.sum())
    .reset_index()
)

产生:

         Municipality                           PartyName  TotalValidVotes
0  BUF - Buffalo City  AFRICAN CHRISTIAN DEMOCRATIC PARTY         0.634710
1  BUF - Buffalo City        AFRICAN INDEPENDENT CONGRESS         3.930719
2  BUF - Buffalo City           AFRICAN NATIONAL CONGRESS        67.540832
3  BUF - Buffalo City              CONGRESS OF THE PEOPLE         0.985955
4  BUF - Buffalo City                 DEMOCRATIC ALLIANCE        26.907784

此代码段应该可以工作,而无需创建中间数据帧

更简单、更高效的版本:

您可以在'sum'上使用^{}+^{}来获取组的和。然后,您可以将列TotalValidVotes除以该和,然后乘以100得到百分比

df['TotalValidVotes_Pct'] = (df['TotalValidVotes'] / df.groupby('Municipality')['TotalValidVotes'].transform('sum')) * 100

请注意,此版本仅使用矢量化操作,运行速度应更快

结果:

print(df)

         Municipality                           PartyName  TotalValidVotes  TotalValidVotes_Pct
0  BUF - Buffalo City  AFRICAN CHRISTIAN DEMOCRATIC PARTY             2519             0.634710
1  BUF - Buffalo City        AFRICAN INDEPENDENT CONGRESS            15600             3.930719
2  BUF - Buffalo City           AFRICAN NATIONAL CONGRESS           268052            67.540832
3  BUF - Buffalo City              CONGRESS OF THE PEOPLE             3913             0.985955
4  BUF - Buffalo City                 DEMOCRATIC ALLIANCE           106790            26.907784

相关问题 更多 >