Python:比较两个csv文件中的特定列

9 投票
4 回答
41110 浏览
提问于 2025-04-16 10:01

假设我有两个CSV文件(file1和file2),它们的内容如下:

file1:

fred,43,Male,"23,45",blue,"1, bedrock avenue"

file2:

fred,39,Male,"23,45",blue,"1, bedrock avenue"

我想比较这两个CSV文件中的记录,看看第0、2、3、4和5列的内容是否相同。我不关心第1列。

用Python最好的方法是什么呢?

编辑:

如果能提供一些示例代码,那就太好了。

编辑2:

请注意,文件中的嵌入逗号需要正确处理。

4 个回答

1

我会读取这两个记录,去掉第一列,然后比较剩下的部分。(在Python3中可以这样做)

import csv
file1 = csv.reader(open("file1.csv", "r"))
file2 = csv.reader(open("file2.csv", "r"))
r1 = next(file1)
r1.pop(1)
r2 = next(file2)
r2.pop(1)
return r1 == r2
7

更新

>>> import csv
>>> csv1 = csv.reader(open("file1.csv", "r"))
>>> csv2 = csv.reader(open("file2.csv", "r"))
>>> while True:
...   try:
...     line1 = csv1.next()
...     line2 = csv2.next()
...     equal = (line1[0]==line2[0] and line1[2]==line2[2] and line1[3]==line2[3] and line1[4]==line2[4] and line1[5]==line2[5])
...     print equal
...   except StopIteration:
...     break
True

三年后,我觉得我更想这样写。

import csv

interesting_cols = [0, 2, 3, 4, 5]

with open("file1.csv", 'r') as file1,\
     open("file2.csv", 'r') as file2:

    reader1, reader2 = csv.reader(file1), csv.reader(file2)

    for line1, line2 in zip(reader1, reader2):
        equal = all(x == y
            for n, (x, y) in enumerate(zip(line1, line2))
            if n in interesting_cols
        )
        print(equal)
11

我想最好的办法是使用Python的一个库:http://docs.python.org/library/csv.html

更新(添加了示例)

import csv
reader1 = csv.reader(open('data1.csv', 'rb'), delimiter=',', quotechar='"'))
row1 = reader1.next()
reader2 = csv.reader(open('data2.csv', 'rb'), delimiter=',', quotechar='"'))
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
    print "eq"
else:
    print "different"

撰写回答