Python:按列打印差异

2024-09-26 22:11:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的python代码来比较2个CSV文件行,匹配每个列字段并显示差异。但是输出不正常,请帮助改进代码输出

(我在谷歌上搜索并找到了一个python包csvdiff,但它需要指定列号。)

2 CSV files:

cat file1.csv
1,2,2222,3333,4444,3,

cat file2.csv
1,2,5555,6666,7777,3,

My Python3 code:

with open('file1.csv', 'r') as t1, open('file2.csv', 'r') as t2:
    filecoming = t1.readlines()
    filevalidation = t2.readlines()

for i in range(0,len(filevalidation)):
    coming_set = set(filecoming[i].replace("\n","").split(","))
    validation_set = set(filevalidation[i].replace("\n","").split(","))
    ReceivedDataList=list(validation_set.intersection(coming_set))
    NotReceivedDataList=list(coming_set.union(validation_set)- 
    coming_set.intersection(validation_set))
    print(NotReceivedDataList)

output:

['6666', '5555', '3333', '2222', '4444', '7777']

即使它正在打印两个文件的差异,输出也不正常。(与文件2有3个差异,与文件1有3个差异)

我正在尝试生成按列的结果,即,文件1中的每个差异对应于文件2中的相应差异

somethinglike

2222  - 5555
3333  - 6666
4444  - 7777

请帮忙

提前谢谢


Tags: 文件csv代码as差异openfile1cat
1条回答
网友
1楼 · 发布于 2024-09-26 22:11:21

试试这个:

import pandas
with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2:
    filecoming = t1.readlines()
    filevalidation = t2.readlines()

for i in range(0,len(filevalidation)):
    coming_set = set(filecoming[i].replace("\n","").split(","))
    validation_set = set(filevalidation[i].replace("\n","").split(","))
    ReceivedDataList=list(validation_set.intersection(coming_set))
    NotReceivedDataList=list(coming_set.union(validation_set)-coming_set.intersection(validation_set))
    print(NotReceivedDataList)

old=[]
new=[]
for items in NotReceivedDataList:
    if items in coming_set:
        old.append(items)

    elif items in validation_set:
        new.append(items)
print(old)
print(new)

输出:

['2222', '5555', '6666', '3333', '4444', '7777']
['2222', '3333', '4444']
['5555', '6666', '7777']

添加: 这是我对你的更多帮助 让我们从CSV文件中获取旧的和新的,然后[item for item in old if item not in new]将为您提供不在new中的项目。另外,借助enumerate我们可以识别出列是不同的(差异在第2、3和4列中)

old=[1,2,2222,3333,4444,3]
new=[1,2,5555,6666,7777,3]

print([item for item in old if item not in new])
print([item for item in new if item not in old])

for index, (first, second) in enumerate(zip(old, new)):
    if first != second:
        print(index, first ,second)

输出:

[2222, 3333, 4444]
[5555, 6666, 7777]
2 2222 5555
3 3333 6666
4 4444 7777

相关问题 更多 >

    热门问题