擅长:python、mysql、java
<p>最后,我可以使用字典在很短的时间内实现这一点。
i、 e一个370 MB的数据与270MB的数据文件相比,最多50秒(使用元组作为键)。
脚本如下:</p>
<pre><code> reader = open("fileA",'r')
reader2 = open("fileB",'r')
TmpDict ={}
TmpDict2={}
for line in reader:
line = line.strip()
TmpArr=line.split('|')
#Forming a dictionary with below columns as keys
TmpDict[TmpArr[2],TmpArr[3],TmpArr[11],TmpArr[12],TmpArr[13],TmpArr[14]]=line
for line in reader2:
line = line.strip()
TmpArr=line.split('|')
TmpDict2[TmpArr[2],TmpArr[3],TmpArr[11],TmpArr[12],TmpArr[13],TmpArr[14]]=line
outfile = open('MatchedRecords.txt', 'w')
outfileNonMatchedB=open('notInB','w')
outfileNonMatchedA=open('notInA','w')
for k,v in TmpDict.iteritems():
if k in TmpDict2:
outfile.write(v+ '\n')
else:
outfileNonMatchedB.write(v+'\n')
outfile.close()
outfileNonMatchedB.close()
for k,v in TmpDict2.iteritems():
if k not in TmpDict:
outfileNonMatchedA.write(v+'\n')
outfileNonMatchedA.close()
</code></pre>
<p>有什么可以改进的吗?建议我!
谢谢</p>