擅长:python、mysql、java
<p>对于预先编码的corporapi,您可以尝试使用<code>corpus.raw()</code>,而不是使用<code>corpus.words()</code>,例如</p>
<pre><code>>>> from nltk.util import ngrams
>>> from nltk.corpus import brown
>>> brown.words()
[u'The', u'Fulton', u'County', u'Grand', u'Jury', ...]
>>> trigrams = ngrams(brown.words(), 3)
>>> for i in trigrams:
... print i
</code></pre>
<p>正如@alexis所指出的,上面的代码也适用于加载了<code>PlaintextCorpusReader</code>的自定义语料库,请参见<a href="http://www.nltk.org/_modules/nltk/corpus/reader/plaintext.html" rel="nofollow">http://www.nltk.org/_modules/nltk/corpus/reader/plaintext.html</a></p>