基于第一列中的字符串跨数据帧聚合数据

### three datasets d1 = {'part_id': ['PartID_1234', 'PartID_5678'], 'col2': [1, 2]} df1 = pd.DataFrame(data=d1) d2 = {'part_id': ['PartID_1234', 'PartID_5678'], 'col2': [3, 4]} df2 = pd.DataFrame(data=d2) d3 = {'part_id': ['PartID_5678', 'PartID_1234'], 'col2': [5, 6]} df3 = pd.DataFrame(data=d3) ### aggregated dataset based on ID import numpy as np result = pd.DataFrame(np.array([['PartID_1234', 1, 3, 6], ['PartID_5678', 2, 4, 5]]))

2条回答

网友

1楼 · 编辑于 2024-05-08 17:21:50

我相信您需要^{}和^{}来表示每个DataFrame，以便在列表理解中按列part_id进行索引：

dfs = [df1, df2, df3]
dfs = [x.set_index('part_id')['col2'] for x in dfs]
df = pd.concat(dfs, axis=1).reset_index()
df.columns = range(len(df.columns))
print (df)

             0  1  2  3
0  PartID_1234  1  3  6
1  PartID_5678  2  4  5

如果需要索引中的第一列：

dfs = [df1, df2, df3]
dfs = [x.set_index('part_id')['col2'] for x in dfs]
df = pd.concat(dfs, axis=1, ignore_index=True)
print (df)

             0  1  2
PartID_1234  1  3  6
PartID_5678  2  4  5

网友

2楼 · 编辑于 2024-05-08 17:21:50

可以将merge与how='outer'一起使用，以获得预期的外部联接效果，如：

df1.merge(df2, on='part_id', how='outer').merge(df3, on='part_id', how='outer')

    part_id     col2_x  col2_y  col2
0   PartID_1234   1        3    6
1   PartID_5678   2        4    5

相关问题更多 >

编程相关推荐

热门问题

热门文章