擅长:python、mysql、java
<p>你应该把你的行拆分成“单词”,只在字典中查找这些单词:</p>
<pre class="lang-py prettyprint-override"><code>>>> re.findall(r"\w+", "CHROMOSOME_IV ncRNA gene 5723085 5723105 . - . ID=Gene:WBGene00045518 CHROMOSOME_IV ncRNA ncRNA 5723085 5723105 . - . Parent=Gene:WBGene00045518")
['CHROMOSOME_IV', 'ncRNA', 'gene', '5723085', '5723105', 'ID', 'Gene', 'WBGene00045518', 'CHROMOSOME_IV', 'ncRNA', 'ncRNA', '5723085', '5723105', 'Parent', 'Gene', 'WBGene00045518']
</code></pre>
<p>这将消除您对每一行所做的字典循环。在</p>
<p>以下是完整的代码:</p>
^{pr2}$
<p><strong>编辑</strong>:另一种方法是从字典中构建单个mega regex:</p>
<pre class="lang-py prettyprint-override"><code>with open("f1.txt", "r") as infile1:
udict = dict(line.strip().split("\t", 1) for line in infile1)
regex = re.compile("|".join(map(re.escape, udict)))
with open("f2.txt", "r") as infile2, open("out.txt", "w") as outfile:
for line in infile2:
outfile.write(regex.sub(lambda m: udict[m.group()], line))
</code></pre>