擅长:python、mysql、java
<p>3MB不是很大(在可以运行Windows的计算机上)。只需将第二个文件作为单个字符串加载到内存中,即可获得片段:</p>
<pre><code># populate `id -> (start, end)` map
ids = {}
with open(r"\Users\Zebrafish\Desktop\ASHISH\IDs.txt") as id_file:
for line in id_file:
if line.strip(): # non-blank line
id, start, end = line.split()
ids[id] = int(start), int(end)
# load the file as a single string (ignoring whitespace)
with open("/Users/Zebrafish/Desktop/ASHISH/complete.txt") as seq_file:
s = "".join(seq_file.read().split()) # or re.sub("\s+", "", seq_file.read())
# print fragments
for id, (start, end) in ids.items():
print("{id} -> {fragment}".format(id=id, fragment=s[start:end]))
</code></pre>
<p>如果<code>complete.txt</code>文件不适合内存,可以使用<code>mmap</code>以字节序列的形式访问其内容,而无需将整个文件加载到内存中:</p>
<pre><code>from mmap import ACCESS_READ, mmap
with open("complete.txt") as f, mmap(f.fileno(), 0, access=ACCESS_READ) as s:
# use `s` here (assume that indices refer to the raw file in this case)
# e.g., `fragment = s[start:end]`
</code></pre>