Pandas:如何找到一个群体的百分比?

2024-09-22 10:23:00 发布

您现在位置:Python中文网/ 问答频道 /正文

***免责声明:我是一个彻头彻尾的傻瓜。我试图通过解决工作中的问题来学习熊猫。这是我全部问题的一个子集,但我正在尝试在处理项目之前解决这些问题。谢谢你的耐心!***

我试图找出每个基金在各州总数中所占的百分比

概念:我们的基金(部门)设在美国。这些基金对不同的项目有不同的补偿水平。我首先需要合计(分组)基金,以便知道每个基金的总薪酬

我还需要按州对薪酬进行合计(分组),以便以后可以按州计算出基金的百分比

我已在此处将数据转换为示例代码:

import pandas as pd

#样本数据

data = {'Fund':['1000','1000','2000','2000','3000','3000','4000','4000'], 
    'State':['AL','AL','FL','FL','AL','AL','NC','NC'],
    'Compensation':[2000,2500,1500,1750,4000,3200,1450,3000]}

# Create DataFrame (employees)
employees = pd.DataFrame(data)

如果照片没有出现在这里,我就是这么做的:

print(employees)
employees.groupby('Fund').Compensation.sum()
employees.groupby('State').Compensation.sum()

我花了一天的大部分时间在实际数据上,试图找出如何获得:

基金的赔偿金为国家赔偿金总额的__% 或者

基金_1000占所有薪酬总额的38%

谢谢你的耐心和帮助

约翰


Tags: 数据项目基金百分比statesumalnc
3条回答

您还可以计算和合并数据帧

import pandas as pd

data = {
    "Fund": ["1000", "1000", "2000", "2000", "3000", "3000", "4000", "4000"],
    "State": ["AL", "AL", "FL", "FL", "AL", "AL", "NC", "NC"],
    "Compensation": [2000, 2500, 1500, 1750, 4000, 3200, 1450, 3000],
}
# Create dataframe from dictionary provided
df = pd.DataFrame.from_dict(data)

# first group compensation by state and fund 
df_fund = df.groupby(["Fund", "State"]).Compensation.sum().reset_index()

# Calculate Total by state in new df
df_total = df_fund.groupby("State").Compensation.sum().reset_index()

# Merge dataframes with total column
merged = df_fund.merge(df_total, how="outer", left_on="State", right_on="State")

#Add percentage col to merged dataframe. 
merged["percentage"] = merged["Compensation_x"] / merged["Compensation_y"] * 100

这里有一个解决方案。您可以首先执行groupby以获得最低级别的聚合,然后使用groupby转换将这些值除以状态总数

agg = df.groupby(['Fund','State'],as_index=False)['Compensation'].sum()
agg['percentage'] = (agg['Compensation'] / agg.groupby('State')['Compensation'].transform(sum)) * 100

agg.to_dict()
{'Fund': {0: '1000', 1: '2000', 2: '3000', 3: '4000'},
'State': {0: 'AL', 1: 'FL', 2: 'AL', 3: 'NC'},
 'Compensation': {0: 4500, 1: 3250, 2: 7200, 3: 4450},
 'percentage': {0: 38.46153846153847,
  1: 100.0,
  2: 61.53846153846154,
  3: 100.0}}

这应该可以完成以下工作:

df['total_state_compensataion'] = df.groupby('State')['Compensation'].transform(sum)
df['total_state_fund_compensataion'] = df.groupby(['State','Fund'])['Compensation'].transform(sum)
df['ratio']=df['total_state_fund_compensataion'].div(df['total_state_compensataion'])
>>>df.groupby(['State','Fund'])['ratio'].mean().to_dict()

out[1] {('AL', '1000'): 0.38461538461538464,
 ('AL', '3000'): 0.6153846153846154,
 ('FL', '2000'): 1.0,
 ('NC', '4000'): 1.0}

相关问题 更多 >