合并列标签上的数据帧并覆盖匹配行中的其他值

rec = pd.DataFrame({'batch': ["001","002","003"], 'A': [1, 2, 3], 'B': [4, 5, 6]}) ing1 = pd.DataFrame({'batch': ["002","003","004"], 'C': [12, 13, 14], 'D': [15, 16, 17], 'E': [18, 19, 10]}) ing2 = pd.DataFrame({'batch': ["001","011","012"], 'C': [20, 21, 22], 'D': [23, 24, 25], 'F': [26, 27, 28]})

batch A B C_x D_x E C_y D_y F 0 001 1 4 NaN NaN NaN 20.0 23.0 26.0 1 002 2 5 12.0 15.0 18.0 NaN NaN NaN 2 003 3 6 13.0 16.0 19.0 NaN NaN NaN

2条回答

网友
1楼 · 编辑于 2024-09-29 00:20:39

实际上，df.update()可能是概念上最接近您所要求的函数。但是，您必须预先设置索引并预分配输出数据帧。这可能会也可能不会比.merge()造成更多的麻烦
代码：
# set index rec.set_index("batch", inplace=True) ing1.set_index("batch", inplace=True) ing2.set_index("batch", inplace=True) # preallocate final = pd.DataFrame(columns=["A","B","C","D","E","F"], index=rec.index) # update in order final.update(rec) final.update(ing1) final.update(ing2)
结果:
print(final) A B C D E F batch 001 1 4 20 23 NaN 26 002 2 5 12 15 18 NaN 003 3 6 13 16 19 NaN

网友
2楼 · 编辑于 2024-09-29 00:20:39

合并后直接应用np.where()怎么样？如果右边的列（后缀为“_y”）不是NA，则选择右边，否则选择左边
final = rec.merge(ing1, how='left', on='batch')\ .merge(ing2, how='left', on='batch') final[["C", "D"]] = np.where(~final[["C_y", "D_y"]].isna(), final[["C_y", "D_y"]], final[["C_x", "D_x"]])
输出
print(final[["A","B","C","D","E","F"]]) A B C D E F 0 1 4 20.0 23.0 NaN 26.0 1 2 5 12.0 15.0 18.0 NaN 2 3 6 13.0 16.0 19.0 NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章