擅长:python、mysql、java
<p>您可以将第一个文件中的相对频率存储到字典中,然后迭代第二个文件,如果第一列与原始文件中的任何内容相匹配,则将结果直接写入输出文件:</p>
<pre><code>import csv
tmp = {}
# if 1 file is much larger than the other, load the smaller one here
# make sure it will fit into the memory
with open("ngrams.csv", "rb") as fr:
# using tuple unpacking to extract fixed number of columns from each row
for txt, abs, rel in csv.reader(fr):
# converting strings like "1.435486010883783160220299732E-8"
# to float numbers
tmp[txt] = float(rel)
with open("matchedngrams.csv", "wb") as fw:
writer = csv.writer(fw)
# the 2nd input file will be processed per 1 line to save memory
# the order of items from this file will be preserved
with open("ngramstest.csv", "rb") as fr:
for txt, abs, rel in csv.reader(fr):
if txt in tmp:
# not sure what you want to do with absolute, I use 0 here:
writer.writerow((txt, 0, tmp[txt] / float(rel)))
</code></pre>