<p>这应该是直截了当的-解决方案假设file2的内容相同或更长,因此项目仅附加到file2</p>
<pre><code>import pandas as pd
df1 = pd.read_csv(r"C:\path\to\file1.csv")
df2 = pd.read_csv(r"C:\path\to\file2.csv")
# print(df1)
# print(df2)
df = pd.concat([df1, df2], axis=1)
df['X'] = df['A'] == df['B']
print(df[df.X==False])
df3 = df[df.X==False]['B']
print(df3)
df3.to_csv(r"C:\path\to\file3.csv")
</code></pre>
<p>如果项目的顺序是任意的,您可以使用<code>df.isin()</code>,如下所示:</p>
<pre><code>import pandas as pd
df1 = pd.read_csv(r"C:\path\to\file1.csv")
df2 = pd.read_csv(r"C:\path\to\file2.csv")
df = pd.concat([df1, df2], axis=1)
df['X'] = df['B'].isin(df['A'])
df3 = df[df.X==False]['B']
df3.to_csv(r"C:\path\to\file3.csv")
</code></pre>
<p>我创建了以下两个文件:</p>
<pre><code>A
1_in_A
2_in_A
3_in_A
4_in_A
</code></pre>
<p>和file2.csv:</p>
<pre><code>B
2_in_A
1_in_A
3_in_A
4_in_B
5_in_B
</code></pre>
<p>用于测试。数据帧<code>df</code>如下所示:</p>
<pre><code>| | A | B | X |
| -:|: -|: -|: |
| 0 | 1_in_A | 2_in_A | True |
| 1 | 2_in_A | 1_in_A | True |
| 2 | 3_in_A | 3_in_A | True |
| 3 | 4_in_A | 4_in_B | False |
| 4 | nan | 5_in_B | False |
</code></pre>
<p>我们只选择标记为<code>False</code>的项目</p>