执行merg时防止重复行

import pandas as pd # Import CSVs first = pd.read_csv("../PATH_TO_CSV/A.csv") print("Imported first CSV: " + str(first.shape)) second = pd.read_csv("../PATH_TO_CSV/B.csv") print("Imported second CSV: " + str(second.shape)) # Create a resultant, but empty, DF, and then append the merge. result = pd.DataFrame() result = result.append(pd.merge(first, second), ignore_index = True) print("Merged CSVs... resulting DataFrame is: " + str(result.shape)) # Lets do a "dedupe" to deal with an issue on how Pandas handles datetime merges # I read about an issue where if datetime is involved, duplicate entires will be created. result = result.drop_duplicates() print("Deduping... resulting DataFrame is: " + str(result.shape)) # Save to another CSV result.to_csv("EXPORT.csv", index=False) print("Saved to file.")

3条回答

网友

1楼 · 编辑于 2024-09-30 08:20:05

我想你需要康卡特

result = pd.concat([df1.set_index('id'), df2.set_index('id')],axis = 1).reset_index()

你得到了吗

    id      item_no     description
0   A123    1           Mary had a...
1   A123    2           ...little lamb
2   B456    1           ...Its fleece...

网友

2楼 · 编辑于 2024-09-30 08:20:05

我会这样做：

In [135]: result = A.merge(B.assign(item_num=B.groupby('id').cumcount()+1))

In [136]: result
Out[136]:
     id  item_num       description
0  A123         1     Mary had a...
1  A123         2   ...little lamb.
2  B456         1  ...Its fleece...

说明：我们可以在BDF中创建“virtual”item_num列来连接：

In [137]: B.assign(item_num=B.groupby('id').cumcount()+1)
Out[137]:
     id       description  item_num
0  A123     Mary had a...         1
1  A123   ...little lamb.         2
2  B456  ...Its fleece...         1

网友

3楼 · 编辑于 2024-09-30 08:20:05

尝试索引您的df，然后删除重复项：

df = df.set_index(['id', 'item_num']).drop_duplicates()

相关问题更多 >

编程相关推荐

热门问题

热门文章