<p>不完全正确,但这将匹配您要查找的大多数内容,但<code>On</code>除外。在</p>
<pre><code>import re
text = """
#'On its 25th anniversary, Ashoka',
#'at the Shift Series national conference, Compass Partners and fashion designer Kenneth
Cole',
"""
proper_noun_regex = r'([A-Z]{1}[a-z]{1,}(\s[A-Z]{1}[a-z]{1,})?)'
p = re.compile(proper_noun_regex)
matches = p.findall(text)
print matches
</code></pre>
<p>输出:</p>
^{pr2}$
<p>然后也许你可以实现一个过滤器来检查这个列表。在</p>
<pre><code>def filter_false_positive(unfiltered_matches):
filtered_matches = []
black_list = ["an","on","in","foo","bar"] #etc
for match in unfiltered_matches:
if match.lower() not in black_list:
filtered_matches.append(match)
return filtered_matches
</code></pre>
<p>或者因为python很酷:</p>
<pre><code>def filter_false_positive(unfiltered_matches):
black_list = ["an","on","in","foo","bar"] #etc
return [match for match in filtered_matches if match.lower() not in black_list]
</code></pre>
<p>你可以这样使用它:</p>
<pre><code># CONTINUED FROM THE CODE ABOVE
matches = [i[0] for i in matches]
matches = filter_false_positive(matches)
print matches
</code></pre>
<p>给出最终输出:</p>
<pre><code>['Ashoka', 'Shift Series', 'Compass Partners', 'Kenneth Cole']
</code></pre>
<p>判断一个词是否因为出现在句子开头而大写,或者它是否是一个专有名词,这个问题并不是那么简单。在</p>
<pre><code>'Kenneth Cole is a brand name.' v.s. 'Can I eat something now?' v.s. 'An English man had tea'
</code></pre>
<p>在这种情况下,这是相当困难的,所以如果没有其他标准可以知道专有名词的东西,黑名单,数据库等等,就不会那么容易了。<code>regex</code>太棒了,但我不认为它能以任何微不足道的方式在语法层面上解释英语。。。在</p>
<p>尽管如此,祝你好运!在</p>