不知怎的,我找不到解决问题的适当办法。我想计算单位和Scen列之间的和,对于同一个事实日期,使用相等的值。输出应如下所示:
输出:
Combination Unit_Com Scen Value_Sum Town Country
11-Apr a,b 1 28 Town A USA
11-Apr a,b 2 31 Town A USA
11-Apr a,c 1 30 Town A USA
11-Apr a,c 2 30 Town A USA
11-Apr a,d 1 31 Town A USA
11-Apr a,d 2 29 Town A USA
11-Apr b,c 1 32 Town A USA
11-Apr b,c 2 39 Town A USA
11-Apr b,d 1 33 Town A USA
11-Apr b,d 2 38 Town A USA
11-Apr c,d 1 35 Town A USA
11-Apr c,d 2 37 Town A USA
10-Apr a,b 1 28 Town A USA
10-Apr a,b 2 25 Town A USA
10-Apr a,c 1 32 Town A USA
10-Apr a,c 2 26 Town A USA
10-Apr a,d 1 38 Town A USA
10-Apr a,d 2 22 Town A USA
10-Apr b,c 1 24 Town A USA
10-Apr b,c 2 27 Town A USA
10-Apr b,d 1 30 Town A USA
10-Apr b,d 2 23 Town A USA
10-Apr c,d 1 34 Town A USA
10-Apr c,d 2 24 Town A USA
其计算方法如下:
^{pr2}$然后对每个事实日期执行此操作。在
最后是Town A and Town B
和a,e等的组合
不幸的是,我没有收到任何密码,我被困在这里:
更新:
我更新了代码,但不知怎么的还是收到了错误的输出
calculating date: 11-Apr
11-Apr 1,1 a,b Town A,Town A USA,USA 28
11-Apr 1,2 a,b Town A,Town A USA,USA 33
11-Apr 1,1 a,c Town A,Town A USA,USA 30
11-Apr 1,2 a,c Town A,Town A USA,USA 32
11-Apr 1,1 a,d Town A,Town A USA,USA 31
11-Apr 1,2 a,d Town A,Town A USA,USA 31
11-Apr 1,1 a,b Town A,Town A USA,USA 23
11-Apr 1,2 a,b Town A,Town A USA,USA 26
11-Apr 1,1 a,c Town A,Town A USA,USA 27
11-Apr 1,2 a,c Town A,Town A USA,USA 27
11-Apr 1,1 a,d Town A,Town A USA,USA 33
11-Apr 1,2 a,d Town A,Town A USA,USA 23
calculating date: 10-Apr
10-Apr 2,1 a,b Town A,Town A USA,USA 26
10-Apr 2,2 a,b Town A,Town A USA,USA 31
10-Apr 2,1 a,c Town A,Town A USA,USA 28
10-Apr 2,2 a,c Town A,Town A USA,USA 30
10-Apr 2,1 a,d Town A,Town A USA,USA 29
10-Apr 2,2 a,d Town A,Town A USA,USA 29
10-Apr 2,1 a,b Town A,Town A USA,USA 21
10-Apr 2,2 a,b Town A,Town A USA,USA 24
10-Apr 2,1 a,c Town A,Town A USA,USA 25
10-Apr 2,2 a,c Town A,Town A USA,USA 25
10-Apr 2,1 a,d Town A,Town A USA,USA 31
10-Apr 2,2 a,d Town A,Town A USA,USA 21
代码如下所示:
import pandas as pd
df = pd.DataFrame({'fact_date': ['11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','11-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr','10-Apr'],
'Unit': ['a','a','b','b','c','c','d','d','a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h','i','i','j','j','k','k','l','l'],
'Town': ['Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town A','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B','Town B'],
'Scen': [1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2],
'Value': [13,11,15,20,17,19,18,18,18,12,10,13,14,14,20,10,18,17,15,19,11,14,14,17,19,10,16,10,16,19,12,11],
'Country': ['USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA','USA']})
test_df = pd.DataFrame([])
cluster_names = df['fact_date'].unique()
disjoint_clusters = []
for idx,item in enumerate(cluster_names):
df[df['fact_date'] == item]
print('calculating date: ' +str(item))
for j in range(idx+1, len(df)):
if df.iloc[idx]['Unit'] != df.iloc[j]['Unit'] and df.iloc[idx]['Town'] == 'Town A' and df.iloc[j]['Town'] == 'Town A':
print(item,
str(df.iloc[idx]['Scen'])+str(',')+str(df.iloc[j]['Scen']),
df.iloc[idx]['Unit']+str(',')+df.iloc[j]['Unit'],
df.iloc[idx]['Town']+str(',')+df.iloc[j]['Town'],
df.iloc[idx]['Country']+str(',')+df.iloc[j]['Country'],
df.iloc[idx]['Value']+df.iloc[j]['Value'])
这样你就能得到问题的预期结果。其思想是在列
'fact_date','Country','Town','Scen'
上使用groupby
,然后在分组数据帧上使用combinations
fromitertools
作为列'Value','Unit'
中的值。您可以使用列表理解和pd.DataFrame
直接创建结果数据帧:您可能需要重新排序列并获得相同的输出范围,然后可以执行以下操作:
^{pr2}$编辑:对于使用
Town
执行相同的操作,您可以这样做:不同之处在于,列
Town
不再在groupby
中,而是在combinations
中选择的列中,以及使其正常工作的小更改。在对于随机选择这些组合,我建议您看看函数^{} ,例如,如果您想要其中的10个,您可以:
相关问题 更多 >
编程相关推荐