擅长:python、mysql、java
<p>你可以使用斯坦福标记器。
你可以使用下面的代码。在</p>
<pre><code>from nltk.tokenize.stanford import StanfordTokenizer
token = StanfordTokenizer('stanford-ner-2014-06-16/stanford-ner.jar')
qry="In the UK, the class is relatively crowded with Zacc competing with Abc's Popol (market leader) and Xyz's Abcvd."
tok = token.tokenize(qry)
print tok
</code></pre>
<p>你将得到你需要的代币。在</p>
<blockquote>
<p>[u'In',<br/>
u'the',<br/>
u'UK',<br/>
u',',<br/>
u'the',<br/>
u'class',<br/>
u'is',<br/>
u'relatively',<br/>
u'crowded',<br/>
u'with',<br/>
u'Zacc',<br/>
u'competing',<br/>
u'with',<br/>
u'Abc',<br/>
u"'s",<br/>
u'Popol',<br/>
u'-LRB-',<br/>
u'market',<br/>
u'leader',<br/>
u'-RRB-',<br/>
u'and',<br/>
u'Xyz',<br/>
u"'s",<br/>
u'Abcvd',<br/>
u'.'] </p>
</blockquote>