执行merg时防止重复行问题的回答 - Python中文网

执行merg时防止重复行

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我正在做一个数据分析项目，结果遇到了麻烦。你知道吗</p> <p>基本上，如果我有一个示例CSV“A”：</p> <pre><code>id | item_num A123 | 1 A123 | 2 B456 | 1 </code></pre> <p>我有一个例子“B”：</p> <pre><code>id | description A123 | Mary had a... A123 | ...little lamb. B456 | ...Its fleece... </code></pre> <p>如果我使用<code>Pandas</code>执行<code>merge</code>，结果如下：</p> <pre><code>id | item_num | description A123 | 1 | Mary had a... A123 | 2 | Mary had a... A123 | 1 | ...little lamb. A123 | 2 | ...little lamb. B456 | 1 | Its fleece... </code></pre> <p>我怎样才能让它变成：</p> <pre><code>id | item_num | description A123 | 1 | Mary had a... A123 | 2 | ...little lamb... B456 | 1 | Its fleece... </code></pre> <p>这是我的密码：</p> <pre><code>import pandas as pd # Import CSVs first = pd.read_csv("../PATH_TO_CSV/A.csv") print("Imported first CSV: " + str(first.shape)) second = pd.read_csv("../PATH_TO_CSV/B.csv") print("Imported second CSV: " + str(second.shape)) # Create a resultant, but empty, DF, and then append the merge. result = pd.DataFrame() result = result.append(pd.merge(first, second), ignore_index = True) print("Merged CSVs... resulting DataFrame is: " + str(result.shape)) # Lets do a "dedupe" to deal with an issue on how Pandas handles datetime merges # I read about an issue where if datetime is involved, duplicate entires will be created. result = result.drop_duplicates() print("Deduping... resulting DataFrame is: " + str(result.shape)) # Save to another CSV result.to_csv("EXPORT.csv", index=False) print("Saved to file.") </code></pre> <p>我真的很感激任何帮助-我很困！我要处理20000多行。你知道吗</p> <p>谢谢。你知道吗</p> <p>编辑：我的文章被标记为可能的重复。不是的，因为我不一定要添加一个列-我只是想阻止<code>description</code>乘以<code>item_num</code>的个数，这个数是属于特定的<code>id</code>。你知道吗</p> <hr/> <p><strong>更新，6月21日：</strong></p> <p>如果两个df看起来像这样，我怎么能合并呢？你知道吗</p> <pre><code>id | item_num | other_col A123 | 1 | lorem ipsum A123 | 2 | dolor sit A123 | 3 | amet, consectetur B456 | 1 | lorem ipsum </code></pre> <p>我有一个例子“B”：</p> <pre><code>id | item_num | description A123 | 1 | Mary had a... A123 | 2 | ...little lamb. B456 | 1 | ...Its fleece... </code></pre> <p>所以我的结论是：</p> <pre><code>id | item_num | other_col | description A123 | 1 | lorem ipsum | Mary Had a... A123 | 2 | dolor sit | ...little lamb. B456 | 1 | lorem ipsum | ...Its fleece... </code></pre> <p>也就是说，在“其他列”中有“amet，consectetur”的3的行被忽略。你知道吗</p>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java