擅长:python、mysql、java
<p>您正在寻找<a href="http://nltk.org/install.html" rel="nofollow">NLTK</a>包的<code>tokenize</code>函数。<code>NLTK</code>代表<em>自然语言工具包</em></p>
<p>或者从<code>re</code>模块中尝试<code>re.split</code>。你知道吗</p>
<p>来自<a href="http://docs.python.org/2/library/re.html" rel="nofollow">re</a>文件。你知道吗</p>
<pre><code>>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
['0', '3', '9']
</code></pre>