优化Python过滤程序的技巧

post = open(INPUTFILE1, "rb") for line in post: cut = line.split(',') pre = open(INPUTFILE2, "rb") for otherline in pre: cuttwo = otherline.split(',') if cut[1] == cuttwo[1] and cut[3] == cuttwo[3] and cut[9] == cuttwo[9]: OUTPUTFILE.write(otherline) break post.close() pre.close() OUTPUTFILE.close()

1条回答

网友

1楼 · 发布于 2024-05-17 09:53:30

如果你只有几百行是潜在的，那么使用如下方法：

from operator import itemgetter
key = itemgetter(1, 3, 9)
with open('smallfile') as fin:
    valid = set(key(line.split(',')) for line in fin)

with open('largerfile') as fin:
    lines = (line.split(',') for line in fin)
    for line in lines:
        if key(line) in valid:
            # do something....

这节省了不必要的迭代，并充分利用了Python的内置功能以实现高效的查找。你知道吗

如果要在输出中使用小文件的整行（如果存在匹配项），请使用字典而不是集合：

from operator import itemgetter
key = itemgetter(1, 3, 9)
with open('smallfile') as fin:
    valid = dict((key(line.split(',')), line) for line in fin)

然后你的处理循环会是这样的：

with open('largerfile') as fin:
    lines = (line.split(',') for line in fin)
    for line in lines:
        otherline = valid.get(key(line), None)
        if otherline is not None:
            # do something....

相关问题更多 >

编程相关推荐

热门问题

热门文章

优化Python过滤程序的技巧

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >