<p>我建议用熊猫来完成这样的任务</p>
<p>首先,您需要将csv内容读入dataframe对象。这可以通过以下方式完成:</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
# make a dataframe from each csv file
df1 = pd.read_csv('planets1.csv')
df2 = pd.read_csv('planets2.csv')
</code></pre>
<p>如果CSV文件中没有每列的名称,则可能需要为每列声明名称</p>
<pre class="lang-py prettyprint-override"><code>colnames = ['col1', 'col2', ..., 'coln']
df1 = pd.read_csv('planets1.csv', names=colnames, index_col=0)
df2 = pd.read_csv('planets2.csv', names=colnames, index_col=0)
# use index_col=0 if csv already has an index column
</code></pre>
<p/><hr/>
为了代码的可复制性,我将在下面定义没有csv的dataframe对象:
<pre class="lang-py prettyprint-override"><code>import pandas as pd
# example column names
colnames = ['A','B','C']
# example dataframes
df1 = pd.DataFrame([[0,3,6], [4,5,6], [3,2,5]], columns=colnames)
df2 = pd.DataFrame([[1,3,1], [4,3,6], [3,6,5]], columns=colnames)
</code></pre>
<p>请注意,df1如下所示:</p>
<pre class="lang-py prettyprint-override"><code> A B C
-
0 0 3 6
1 4 5 6
2 3 2 5
</code></pre>
<p>df2看起来是这样的:</p>
<pre class="lang-py prettyprint-override"><code> A B C
-
0 1 3 1
1 4 3 6
2 3 6 5
</code></pre>
<p>以下代码比较数据帧,将比较连接到新数据帧,然后将结果保存到CSV:</p>
<pre><code># define the condition you want to check for (i.e., mismatches)
mask = (df1 != df2)
# df1[mask], df2[mask] will replace matched values with NaN (Not a Number), and leave mismatches
# dropna(how='all') will remove rows filled entirely with NaNs
errors_1 = df1[mask].dropna(how='all')
errors_2 = df2[mask].dropna(how='all')
# add labels to column names
errors_1.columns += '_1' # for planets 1
errors_2.columns += '_2' # for planets 2
# you can now combine horizontally into one big dataframe
errors = pd.concat([errors_1,errors_2],axis=1)
# if you want, reorder the columns of `errors` so compared columns are next to each other
errors = errors.reindex(sorted(errors.columns), axis=1)
# if you don't like the clutter of NaN values, you can replace them with fillna()
errors = errors.fillna('_')
# save to a csv
errors.to_csv('mismatches.csv')
</code></pre>
<p>最终结果如下所示:</p>
<pre class="lang-py prettyprint-override"><code> A_1 A_2 B_1 B_2 C_1 C_2
-
0 0 1 _ _ 6 1
1 _ _ 5 3 _ _
2 _ _ 2 6 _ _
</code></pre>
<p>希望这有帮助</p>