在python中迭代和匹配大文件

网友

1楼 · 编辑于 2024-09-29 01:24:30

如果使用线性时间算法而不是二次时间算法，则1e5和6e5不是大数：

#!/usr/bin/env python
with open('letters') as file:
    letters = file.read().splitlines()

def combine_ids(letters):
    with open('numbers') as file:
        for line in file:
            id, space, numbers_str = line.lstrip().partition(' ')
            try:
                numbers = list(map(int, numbers_str.split(',')))
            except ValueError:
                continue # skip invalid lines
            for n in numbers:
                try:
                    yield letters[n], id
                except IndexError:
                    pass

result = dict(combine_ids(letters))
print(result)

如果多个id可能对应于同一个字母（如果numbers文件中有重复的数字），那么最新的数字获胜。你知道吗

示例

数字：

id1 5, 33
id2 23
id3 103, 2, 3

信件：

AAAA
AAAB
AAAC
AAAD
AAAE
AAAF
AAAG
AAAH
AAAI
AAAJ
AAAK
...
AAAX
AAAY
AAAZ

输出

{'AAAX': 'id2', 'AAAC': 'id3', 'AAAD': 'id3', 'AAAF': 'id1'}

注意：数字2对应于AAAC（零基索引），如果字母应该从1索引，则使用letters[n-1]（假设n>=1）。你知道吗

网友

2楼 · 编辑于 2024-09-29 01:24:30

将所有字母存储在一个列表中letters = ["AAAA", "AAAB", "AAAC", ...]。你知道吗

现在在读入数字文件后，创建一个映射，如

0 mapped to id1
1 mapped to id2
....

m[0] = "id1", m[1] = "id2"...

在执行上述步骤的同时，创建一个零数组，读入numbers文件并指定它所属的映射行

p = [0] * len(letters)
nums = row[row.find(" ") + 1:].split(", ")
row_name = row[:row.find(" ") - 1]
for num in nums:
    p[num] = m[row_name]

现在查找letters列表中的第i个字母及其编号

print p[i]

网友

3楼 · 编辑于 2024-09-29 01:24:30

只需对每个文件进行一次传递，O（N）：

将字母文件读入数组。您将获得数组索引（+1？）=行号。
读取数字文件。对于每行：使用数字将id与数组中的字母组合。

示例

输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中迭代和匹配大文件

示例

输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >