<p>我不会使用nltk而是regex。</p>
<ol>
<li>获取所有css颜色的列表(<a href="http://colours.neilorangepeel.com/" rel="nofollow">here</a>)</li>
<li>提取颜色名称并建立一个列表(使用beauthulsoup)</li>
<li>构建正则表达式模式</li>
<li>使用此regex模式匹配字符串中需要的内容</li>
</ol>
<p>我的工作<br/>
(如果需要,只需更改最后两行和代理设置)</p>
<pre><code>from bs4 import BeautifulSoup
color_url = 'http://colours.neilorangepeel.com/'
proxies = {'http': 'http://proxy.foobar.fr:3128'}#if needed
#GET THE HTML FILE
import urllib.request
authinfo = urllib.request.HTTPBasicAuthHandler()# set up authentication info
proxy_support = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)# build a new opener that adds authentication and caching FTP handlers
urllib.request.install_opener(opener)# install the opener
colorfile = urllib.request.urlopen(color_url)
soup = BeautifulSoup(colorfile, 'html.parser')
#BUILD THE REGEX PATERN
colors = soup.find_all('h1')
colorsnames = [color.string for color in colors]
colorspattern = '|'.join(colorsnames)
colorregex = re.compile(colorspattern)
#MATCH WHAT YOU NEED
if colorregex.search(yourstring):
do what you want
</code></pre>