<p>您可以使用tride和true<a href="https://docs.python.org/3/library/re.html" rel="nofollow noreferrer">re</a>库。你知道吗</p>
<pre><code>import re
from collections import OrderedDict
def get_matches(s, keys, include_duplicates=False):
pattern = re.compile('|'.join(map(re.escape, keys)))
all_matches = pattern.findall(s, re.IGNORECASE)
if not include_duplicates:
all_matches = list(OrderedDict.fromkeys(all_matches).keys())
return all_matches
</code></pre>
<p>这是非常多样化的,因为不需要担心检索无序的匹配<em>(感谢<code>dict.fromkeys</code>)</em>。您可以选择在响应中包含重复项。你知道吗</p>
<hr/>
<h2>解释</h2>
<p>我对re所做的就是创建一个模式来查找<code>keywords</code>*(<code>keys)* seperated by a</code>| <code>this tells</code>re`中的每个字符串,以查找所有匹配的键。你知道吗</p>
<p><a href="https://docs.python.org/3/library/re.html#re.findall" rel="nofollow noreferrer">re.findall</a>按文档中说明的顺序返回匹配项:</p>
<blockquote>
<p>Return <strong>all non-overlapping matches</strong> of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are <strong>returned
in the order found.</strong></p>
</blockquote>
<p>这不考虑重复项,因此<code>include_duplicates</code>参数包含在需要它们的情况下。您可以将结果转换成一个集合来删除重复项,尽管这样会丢失顺序完整性,因此我使用<a href="https://docs.python.org/3/library/collections.html" rel="nofollow noreferrer">collections.OrderedDict</a>并将其转换回一个列表。你知道吗</p>
<hr/>
<h2>投入使用:</h2>
<pre><code>text = "there is a car accident on the freeway so that why I am late for the show."
keywords= {
"freeway",
"doesn't turn on",
"dropped",
"got sick",
"traffic jam",
" car accident"}
matches = get_matches(text, keywords)
print(f"the list of matched words are: {', '.join(matches)}")
#the list of matched words are: car accident, freeway, freeway
</code></pre>
<p>你可以自己试试<a href="https://repl.it/repls/AbleEssentialDribbleware" rel="nofollow noreferrer">https://repl.it/repls/AbleEssentialDribbleware</a>。你知道吗</p>
<p><strong>编辑</p>
<p><em>正如您在评论中所要求的:</em></p>
<p>要解释这条线的作用:</p>
<pre><code>pattern = re.compile('|'.join(map(re.escape, keys)))
</code></pre>
<ul>
<li><code>re.compile</code>-从字符串生成正则表达式模式。-<a href="https://docs.python.org/3/library/re.html#re.compile" rel="nofollow noreferrer">see the docs</a></li>
<li><code>join</code>接受一个字符串的iterable,并使其中一个字符串都被前面的字符串隔开。-<a href="https://docs.python.org/3/library/stdtypes.html#str.join" rel="nofollow noreferrer">see the docs</a></li>
<li><code>map</code>&;<code>re.escape</code><em>您可以将此内容用于您的案例</em><strong>但是</strong>如果您或任何阅读此内容的人正在使用更复杂的关键字搜索,则此操作将获取每个关键字并转义<code>re</code>的特殊元字符-(请参阅文档:<a href="https://docs.python.org/3/library/functions.html#map" rel="nofollow noreferrer">map</a>,<a href="https://docs.python.org/3/library/re.html#re.escape" rel="nofollow noreferrer">re.escape</a>)</li>
</ul>
<p>这行可以在没有<code>map</code>和<code>re.escape</code>的情况下重写,并且仍然可以像这样正常工作:</p>
<pre><code>pattern = re.compile('|'.join(keys))
</code></pre>
<p>只知道不能包含这样的字符:<code>(</code>或<code>*</code>等。。。在你的关键词里。你知道吗</p>