<p>我认为这可能是你所需要的东西的本质。<strong>请查看已编辑的版本。</strong></p>
<p>正如您在问题中所说,标记<code>Sentence</code>的结果类似于<code>tagged</code>。如果只需要<code>Sentence</code>中的名词,可以使用<code>nouns =</code>之后的表达式来恢复它们。你知道吗</p>
<pre><code>Sentence = " O gato esta querendo comer o rato "
tagged = [('O', 'ADJ'), ('gato', 'N'), ('esta', 'V'), ('querendo', 'V'), ('comer', 'V'), ('o', 'ADJ'), ('rato', 'N')]
nouns = [t[0] for t in tagged if t[1]=='N']
print (nouns)
</code></pre>
<p>输出:</p>
<pre><code>['gato', 'rato']
</code></pre>
<p><strong>编辑:</strong>我不清楚你想要什么。还有一种可能性。你知道吗</p>
<ul>
<li>我还没有安装nlpnet,因为这将是相当多的工作,我不会使用它自己。你知道吗</li>
<li>我模拟标签.txt带标签。你知道吗</li>
<li>我把编码改成了拉丁文1。它用在头和<code>codecs.open</code>中。你知道吗</li>
</ul>
<p>是的。你知道吗</p>
<pre><code># -*- coding: Latin-1 -*-
import codecs
import itertools
def TAGGER_txt(text): ## simulate TAGGER.txt
return [[(u'O', u'ART'), (u'gato', u'N'), (u'esta', u'PROADJ'), (u'querendo', u'V'), (u'comer', u'V'), (u'o', u'ART'), (u'ratão', u'N')]]
with codecs.open('document.txt', encoding='Latin-1') as original_file:
with codecs.open('document_test.txt', 'w') as output_file:
for line in original_file.readlines():
print (line)
words = TAGGER_txt(line)
all_words = list(itertools.chain(*words))
nouns = [word[0] for word in all_words if word[1]=='N']
print (nouns)
</code></pre>
<p>输出:</p>
<pre><code> O gato esta querendo comer o ratão
['gato', 'ratão']
</code></pre>