<p>不确定此解决方案是否有效和健壮,但它很容易理解(至少对我来说):</p>
<pre><code>import re
# get a list of existed names (over 18 000) from the file
with open('names.txt', 'r') as f:
NAMES = set(f.read().splitlines())
# your list of texts
texts=["Melissa's home was clean and spacious. I would love to visit again soon.",
"Kevin was nice and Kevin's home had a huge parking spaces."]
# join the texts into one string
texts = ' | '.join(texts)
# find all the words that look like names
pattern = r"(\b[A-Z][a-z]+('s)?\b)"
found_names = re.findall(pattern, texts)
# get singular forms, and remove doubles
found_names = set([name[0].replace("'s","") for name in found_names])
# remove all the words that look like names but are not included in the NAMES
found_names = [name for name in found_names if name in NAMES]
# loop trough the found names and remove every name from the texts
for name in found_names:
texts = re.sub(name + "('s)?", "", texts) # include plural forms
# split the texts back to the list
texts = texts.split(' | ')
print(texts)
</code></pre>
<p>输出:</p>
<pre><code>[' home was clean and spacious. I would love to visit again soon.',
' was nice and home had a huge parking spaces.']
</code></pre>
<p>此处获得了姓名列表:<a href="https://www.usna.edu/Users/cs/roche/courses/s15si335/proj1/files.php%3Ff=names.txt.html" rel="nofollow noreferrer">https://www.usna.edu/Users/cs/roche/courses/s15si335/proj1/files.php%3Ff=names.txt.html</a></p>
<p>我完全赞同@James_SO使用更多智能工具的建议</p>