在python中如何在for循环中跳转？

网友

1楼 · 编辑于 2024-10-01 15:31:35

假设[SEQUENCE ID]可以放入内存中，并且您的大部分数据实际上在序列行上（与所提供的示例不同），您可以选择解析一个文件（问题中的file2），并且不仅要声明te[SEQUENCE ID]—而且还要说明每个这样的标识符的文件位置。这种方法将使您能够在不中断当前工作流程的情况下继续工作（例如，必须了解数据库）公司名称：

def get_indexes(filename):
    with open(filename, "rt") as file:
        sequences = {}
        while True:
            position = file.tell()
            id = file.readline()
            if not id:
                break()
            sequences[id.strip()] = position
            # skip corresponding data line:
            file.readline()
    return sequences

def fetcher(filename1, filename2, sequences):
    with open(filename1, "rt") as file1, open(filename2, "rt" as file2):
        while True:
            id = file.readline()
            data = file.readline()
            if not id:
                break
            id = id.strip()
            if id in sequences:
                # postion file2 reading at the identifier:
                file2.seek(sequences[id])
                # throw away id:
                file2.readline()
                data = file.readline()

            yield id, data

if __name__== "__main__":
    sequences = getindexes("/data/file2")
    for id, data in fetcher("/data/file1", "/data/file2", sequences):
        print "%s\n%s"% (id, data)

网友

2楼 · 编辑于 2024-10-01 15:31:35

你只需要N行和N+1行？在本例中，以两行为单位读取文件。然后您始终可以访问序列ID和序列。在

from itertools import izip
with open('data.txt', 'r') as f:
    for line1, line2 in izip(*(iter(f),) * 2):
        print line1, line2

网友

3楼 · 编辑于 2024-10-01 15:31:35

简而言之：您必须使用第三方Python库来保持其中一个数据序列的可搜索性，而不是O（n）。在

如果没有对它们进行排序，则必须至少对其中一个文件进行排序。这样想：我从文件1中获取序列号-为了检查它是否不在文件2中，我必须读取所有文件-比读取一次文件更不可行。在

比排序更好的是，有一个数据结构可以将已排序的数据保存在磁盘上，以提供快速搜索，并且仍然能够增长-这将有助于排序，在第一步中你所要做的就是读取文件2中的条目，然后插入这个不断增长的有序磁盘持久化数据结构。在

当然，您可以滚动自己的数据结构来实现这一点，但我建议使用ZODB-ZOPE的面向对象数据库的ue，其中有一个btree文件夹，并将您的“2行数据”制作为任务的最小对象。在

相关问题更多 >

编程相关推荐

热门问题

热门文章