擅长:python、mysql、java
<p>由于您不需要查找每个<em>关键字,但是如果它们重叠,则可以使用带有<code>findall</code>方法的正则表达式来查找最长的关键字</p>
<p>这里的要点是,首先需要按长度降序对关键字进行排序(因为关键字中有空格),然后需要转义这些值,因为它们包含特殊字符,然后必须修改单词边界以使用<em>明确的</em>单词边界、<code>(?<!\w)</code>和<code>(?!\w)</code>(注意<code>\b</code>是上下文相关的)</p>
<p>使用</p>
<pre><code>pat = r'(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(Keywords,key=len,reverse=True))))
</code></pre>
<p>见<a href="https://ideone.com/sC9cV9" rel="nofollow noreferrer">online Python test</a>:</p>
<pre><code>import re
Keywords = ["Caden(S, A)", "Caden(a","Caden(.A))", "Caden.Q", "Caden.K", "Caden"]
rx = r'(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(Keywords,key=len,reverse=True))))
# => (?<!\w)(?:Caden\(S,\ A\)|Caden\(\.A\)\)|Caden\(a|Caden\.Q|Caden\.K|Caden)(?!\w)
strs = ["Caden(S, A) Charlotte.A, Caden.K;", "Emily.P Ethan.B; Caden(a", "Grayson.Q, Lily; Caden(.A))", "Mason, Emily.Q Noah.B; Caden.Q - Riley.P"]
for s in strs:
print(re.findall(rx, s))
</code></pre>
<p>输出</p>
<pre><code>['Caden(S, A)', 'Caden.K']
['Caden(a']
['Caden(.A))']
['Caden.Q']
</code></pre>