擅长:python、mysql、java
<p>您不应该在每一行打开和关闭输出文件。更重要的是,您可以存储每个<code>concept_phrase</code>的替换,并避免对<code>concept_phrases</code>的翻译版本进行替换(k是概念短语的数量,n是行的数量):</p>
<pre><code>in_file = "/media/saurabh/New Volume/wikiextractor/output/Final_Txt/single_cs.txt"
out_file = "/media/saurabh/New Volume/wikiextractor/output/Final_Txt/single_cs_final.txt"
replacement = dict([(cp, cp.replace(' ', '_')) for cp in concepts])
with open(in_file) as infile, open(out_file, 'a') as file:
for line in infile:
for concept_phrase in concepts:
line = line.replace(concept_phrase, replacement[concept_phrase])
file.write(line)
</code></pre>
<p><code>str.replace</code>通常速度很快,我怀疑用<code>re.sub</code>一次替换就能打败它,即使重复调用<code>str.replace</code>。你知道吗</p>