擅长:python、mysql、java
<p>要获得子序列,可以使用<a href="http://www.nltk.org/_modules/nltk/tokenize/regexp.html" rel="nofollow">RegExp Tokenizer</a>。在</p>
<p>一个如何使用它来拆分句子的示例如下:</p>
<pre><code>from nltk.tokenize.regexp import regexp_tokenize
str1 = 'The api allows the user to achieve following goals: (a) aXXXXXX ,(b)bXXXX, (c) cXXXXX.'
parts = regexp_tokenize(str1, r'\(\w\)\s*', gaps=True)
start_of_sentence = parts.pop(0)
for part in parts:
print(" ".join((start_of_sentence, part)))
</code></pre>