擅长:python、mysql、java
<p>只有当4百万个条目对你的记忆来说太多时,这种方法才值得采用</p>
<ol>
<li>从所有File2 id(上部和下部)创建一个集合</li>
<li>在大文件(File1)上循环并创建一个dict<strong>仅</strong>,其中包含映射中的条目</li>
<li>再次在File2上循环并生成输出文件</li>
</ol>
<p>一些代码可以演示:</p>
<pre><code>s = set()
with open('File2') as file2:
for line in file2:
for i in line.split():
s.add(i)
d = {}
with open('File1') as file1:
for line in file1:
k,v = line.split()
if k in s:
d[k] = v
with open('NewFile2', 'w') as out_file:
with open('File2') as file2:
for line in file2:
k1,k2 = line.split()
out_file.write(' '.join([k1,k2,d[k1],d[k2]]))
</code></pre>