擅长:python、mysql、java
<p>如果你想避免硬编码连词,比如“但是”和“和”,那么试着在切分的同时切分:</p>
<hr/>
<pre><code>import nltk
Digdug = nltk.RegexpParser(r"""
CHUNK_AND_CHINK:
{<.*>+} # Chunk everything
}<CC>+{ # Chink sequences of CC
""")
sentence = nltk.pos_tag(nltk.word_tokenize("There are no large collections present but there is spinal canal stenosis."))
result = Digdug.parse(sentence)
for subtree in result.subtrees(filter=lambda t: t.label() ==
'CHUNK_AND_CHINK'):
print (subtree)
</code></pre>
<p>Chinking基本上把我们不需要的东西从一个短语中排除掉,在这个例子中是“但是”。
有关详细信息,请参见:<a href="http://www.nltk.org/book/ch07.html" rel="nofollow noreferrer">http://www.nltk.org/book/ch07.html</a></p>