如何从第一个文本文件中写入第二个文本文件中不存在的行？

newLines = open("file1.txt", "r") originalLines = open("file2.txt", "r") output = open("output.txt", "w") lines1 = newLines.readlines() lines2 = originalLines.readlines() newLines.close() originalLines.close() duplicate = False for line in lines1: if line.isspace(): continue for line2 in lines2: if line == line2: duplicate = True break if duplicate == False: output.write(line) else: duplicate = False output.close()

2条回答

网友

1楼 · 编辑于 2024-09-30 20:28:08

您可以使用numpy实现更小更快的解决方案。这里我们使用这些numpy方法 np.loadtxt文件文档：https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.htmlnp.setdiff1d设置文档：https://docs.scipy.org/doc/numpy-1.14.5/reference/generated/numpy.setdiff1d.htmlnp.savetxt文件文档：https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html

import numpy as np


arr=np.setdiff1d(np.loadtxt('file1.txt',dtype=str),np.loadtxt('file2.txt',dtype=str))
np.savetxt('output.txt',b,fmt='%s')

网友

2楼 · 编辑于 2024-09-30 20:28:08

基于行为，file2.txt不会以换行符结尾，因此lines2的内容是['Man\n', 'Dog\n', 'Axe\n', 'Cat']。注意'Cat'缺少换行符。你知道吗

我建议你规范化你的台词，这样他们就不会有新词了，换成：

lines1 = newLines.readlines()
lines2 = originalLines.readlines()

使用：

lines1 = [line.rstrip('\n') for line in newLines]
# Set comprehension makes lookup cheaper and dedupes
lines2 = {line.rstrip('\n') for line in originalLines}

和变化：

output.write(line)

收件人：

print(line, file=output)

这将为您添加新行。实际上，最好的解决方案是完全避免内部循环，改变所有这些：

for line2 in lines2:
    if line == line2:
        duplicate = True
        break

if duplicate == False:
    output.write(line)
else:
    duplicate = False

只是：

if line not in lines2:
    print(line, file=output)

如果像我建议的那样使用set来表示lines2，那么测试的开销将从file2.txt中的线性行数下降到大致恒定的值，而不管file2.txt的大小（只要唯一行的集合可以完全放入内存中）。你知道吗

更好的方法是，对打开的文件使用with语句，并流式处理file1.txt，而不是将其保存在内存中，最终得到：

with open("file2.txt") as origlines:
    lines2 = {line.rstrip('\n') for line in origlines}

with open("file1.txt") as newlines, open("output.txt", "w") as output:
    for line in newlines:
        line = line.rstrip('\n')
        if not line.isspace() and line not in lines2:
            print(line, file=output)

相关问题更多 >

编程相关推荐

热门问题

热门文章