Itertool组合Python Datafram

2024-10-03 06:32:02 发布

您现在位置:Python中文网/ 问答频道 /正文

不知怎的,我找不到解决问题的适当办法。我想计算单位和Scen列之间的和,对于同一个事实日期,使用相等的值。输出应如下所示:

输出:

Combination Unit_Com    Scen    Value_Sum   Town    Country
11-Apr      a,b         1       28          Town A  USA
11-Apr      a,b         2       31          Town A  USA
11-Apr      a,c         1       30          Town A  USA
11-Apr      a,c         2       30          Town A  USA
11-Apr      a,d         1       31          Town A  USA
11-Apr      a,d         2       29          Town A  USA
11-Apr      b,c         1       32          Town A  USA
11-Apr      b,c         2       39          Town A  USA
11-Apr      b,d         1       33          Town A  USA
11-Apr      b,d         2       38          Town A  USA
11-Apr      c,d         1       35          Town A  USA
11-Apr      c,d         2       37          Town A  USA
10-Apr      a,b         1       28          Town A  USA
10-Apr      a,b         2       25          Town A  USA
10-Apr      a,c         1       32          Town A  USA
10-Apr      a,c         2       26          Town A  USA
10-Apr      a,d         1       38          Town A  USA
10-Apr      a,d         2       22          Town A  USA
10-Apr      b,c         1       24          Town A  USA
10-Apr      b,c         2       27          Town A  USA
10-Apr      b,d         1       30          Town A  USA
10-Apr      b,d         2       23          Town A  USA
10-Apr      c,d         1       34          Town A  USA
10-Apr      c,d         2       24          Town A  USA

其计算方法如下:

^{pr2}$

然后对每个事实日期执行此操作。在

最后是Town A and Town B和a,e等的组合

不幸的是,我没有收到任何密码,我被困在这里:

更新:

我更新了代码,但不知怎么的还是收到了错误的输出

calculating date: 11-Apr
11-Apr 1,1 a,b Town A,Town A USA,USA 28
11-Apr 1,2 a,b Town A,Town A USA,USA 33
11-Apr 1,1 a,c Town A,Town A USA,USA 30
11-Apr 1,2 a,c Town A,Town A USA,USA 32
11-Apr 1,1 a,d Town A,Town A USA,USA 31
11-Apr 1,2 a,d Town A,Town A USA,USA 31
11-Apr 1,1 a,b Town A,Town A USA,USA 23
11-Apr 1,2 a,b Town A,Town A USA,USA 26
11-Apr 1,1 a,c Town A,Town A USA,USA 27
11-Apr 1,2 a,c Town A,Town A USA,USA 27
11-Apr 1,1 a,d Town A,Town A USA,USA 33
11-Apr 1,2 a,d Town A,Town A USA,USA 23
calculating date: 10-Apr
10-Apr 2,1 a,b Town A,Town A USA,USA 26
10-Apr 2,2 a,b Town A,Town A USA,USA 31
10-Apr 2,1 a,c Town A,Town A USA,USA 28
10-Apr 2,2 a,c Town A,Town A USA,USA 30
10-Apr 2,1 a,d Town A,Town A USA,USA 29
10-Apr 2,2 a,d Town A,Town A USA,USA 29
10-Apr 2,1 a,b Town A,Town A USA,USA 21
10-Apr 2,2 a,b Town A,Town A USA,USA 24
10-Apr 2,1 a,c Town A,Town A USA,USA 25
10-Apr 2,2 a,c Town A,Town A USA,USA 25
10-Apr 2,1 a,d Town A,Town A USA,USA 31
10-Apr 2,2 a,d Town A,Town A USA,USA 21

代码如下所示:

import pandas as pd

df = pd.DataFrame({'fact_date': ['11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr'],
                   'Unit': ['a','a','b','b','c','c','d','d','a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h','i','i','j','j','k','k','l','l'],
                   'Town': ['Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B'],
                   'Scen': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2],
                   'Value': [13,11,15,20,17,19,18,18,18,12,10,13,14,14,20,10,18,17,15,19,11,14,14,17,19,10,16,10,16,19,12,11],
                   'Country': ['USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA']})


test_df = pd.DataFrame([])

cluster_names = df['fact_date'].unique()
disjoint_clusters = []
for idx,item in enumerate(cluster_names):
    df[df['fact_date'] == item]

    print('calculating date: ' +str(item))

    for j in range(idx+1, len(df)):
        if df.iloc[idx]['Unit'] != df.iloc[j]['Unit'] and df.iloc[idx]['Town'] == 'Town A' and df.iloc[j]['Town'] == 'Town A':

            print(item,
                  str(df.iloc[idx]['Scen'])+str(',')+str(df.iloc[j]['Scen']), 
                  df.iloc[idx]['Unit']+str(',')+df.iloc[j]['Unit'],
                  df.iloc[idx]['Town']+str(',')+df.iloc[j]['Town'],
                  df.iloc[idx]['Country']+str(',')+df.iloc[j]['Country'],
                  df.iloc[idx]['Value']+df.iloc[j]['Value'])

Tags: anddfdatevalueunititemcountryapr
1条回答
网友
1楼 · 发布于 2024-10-03 06:32:02

这样你就能得到问题的预期结果。其思想是在列'fact_date','Country','Town','Scen'上使用groupby,然后在分组数据帧上使用combinationsfromitertools作为列'Value','Unit'中的值。您可以使用列表理解和pd.DataFrame直接创建结果数据帧:

from itertools import combinations
df_res = pd.DataFrame([list(name_g) + [val1+val2,'{},{}'.format(unit1,unit2)] 
                       for name_g, df_g in df.groupby(['fact_date','Country','Town','Scen']) 
                       for ((val1, unit1), (val2, unit2)) in combinations(df_g[['Value','Unit']].values,2)],
                      columns=['Combination','Country','Town','Scen','Value_Sum','Unit_Com'])

您可能需要重新排序列并获得相同的输出范围,然后可以执行以下操作:

^{pr2}$

编辑:对于使用Town执行相同的操作,您可以这样做:

df_res = pd.DataFrame([list(name_g) + [val1+val2,'{},{}'.format(unit1,unit2), '{},{}'.format(town1,town2)] 
                       for name_g, df_g in df.groupby(['fact_date','Country','Scen']) 
                       for ((val1, unit1, town1), (val2, unit2, town2)) in combinations(df_g[['Value','Unit','Town']].values,2)],
                      columns=['Combination','Country','Scen','Value_Sum','Unit_Com','Town'])

不同之处在于,列Town不再在groupby中,而是在combinations中选择的列中,以及使其正常工作的小更改。在

对于随机选择这些组合,我建议您看看函数^{},例如,如果您想要其中的10个,您可以:

print (df_res.sample(n=10))
    Combination Country  Scen  Value_Sum Unit_Com           Town
7        10-Apr     USA     1         24      b,c  Town A,Town A
66       11-Apr     USA     1         30      b,f  Town A,Town B
31       10-Apr     USA     2         22      a,i  Town A,Town B
18       10-Apr     USA     1         39      d,i  Town A,Town B
72       11-Apr     USA     1         28      c,g  Town A,Town B
109      11-Apr     USA     2         33      f,g  Town B,Town B
41       10-Apr     USA     2         24      c,d  Town A,Town A
99       11-Apr     USA     2         38      c,f  Town A,Town B
84       11-Apr     USA     2         31      a,b  Town A,Town A
88       11-Apr     USA     2         30      a,f  Town A,Town B

相关问题 更多 >