擅长:python、mysql、java
<p>您可以安装并使用<code>nltk</code>库。这将为您提供英语单词列表以及将每行拆分为单词的方法:</p>
<pre><code>from nltk.tokenize import word_tokenize
from nltk.corpus import words
english = words.words()
with open('Dari.pos') as f_input, open('DariNER.txt', 'w') as f_output:
for line in f_input:
f_output.write(' '.join(word for word in word_tokenize(line) if word.lower() not in english) + '\n')
</code></pre>
<p>安装nltk后,应运行:</p>
^{pr2}$
<p>并使用它来下载<code>words</code></p>