如何快速比较两个文件（~50MB）？

2条回答

网友

1楼 · 编辑于 2024-09-30 14:32:20

你的代码可能有一些错误。。。你只能在同一条线上找到差别。如果两个文件的行数不同或数据未排序，则代码将有问题…下一个是我的代码：

f1 = open('a.txt')
text1Lines = f1.readlines()
f2 = open('b.txt')
text2Lines = f2.readlines()
set1 = set(text1Lines)
set2 = set(text2Lines)
diffList = (set1|set2)-(set1&set2)

网友

2楼 · 编辑于 2024-09-30 14:32:20

您可以同时读取和比较文件，而不是将它们存储在内存中。下面的代码片段提出了许多不切实际的假设（ie这些文件的长度相同，同一个文件中没有两个行出现），但它说明了这个想法：

unique_1 = []
unique_2 = []
for line_1 in handle_1:
    #  Reading line from the 1st file and checking if we have already seen them in in the 2nd
    if line_1 in unique_2:
        unique_2.remove(line)
    #  If line was unique, remember it
    else:
        unique_1.append(line)
    #  The same, only files are the other way
    line_2 = handle_2.readline()
    if line_2 in unique_1:
        unique_1.remove(line)
    else:
        unique_2.append(line)

print('\n'.join(unique_1))
print('\n'.join(unique_2))

当然，它闻起来像是在改造自行车，但是使用简单的算法，而不是复杂的diff构建和difflib的距离计算，可能会获得更好的性能。或者，如果您绝对确定您的文件不会太大而无法放入内存（老实说，这不是最安全的假设），您可以使用设置的差异：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何快速比较两个文件（~50MB）？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >