<p>我们可以将<code>Series.str.findall</code>与regex ignore case标志(<code>?i</code>)一起使用,这样就不必使用<code>import re</code></p>
<pre><code>df['Matches'] = df['desc'].str.findall(f'(?i)({"|".join(strings)})')
itemid desc Matches
0 101 tea leaves [tea]
1 201 baseball gloves [baseball]
2 221 tea leaves from Onus Green Tea Co. [tea, Onus, Tea]
</code></pre>
<p>要删除重复项,我们将字符串转换为大写,并生成<code>set</code>:</p>
<pre><code>df['Matches'] = (
df['desc'].str.findall(f'(?i)({"|".join(strings)})')
.apply(lambda x: list(set(map(str.upper, x))))
)
</code></pre>
<pre><code> itemid desc Matches
0 101 tea leaves [TEA]
1 201 baseball gloves [BASEBALL]
2 221 tea leaves from Onus Green Tea Co. [TEA, ONUS]
</code></pre>
<hr/>
<h3>编辑部分匹配</h3>
<p>我们可以使用单词边界<code>\b</code>:</p>
<pre><code>strings = ['\\b' + f + '\\b' for f in strings]
df['Matches'] = df['desc'].str.findall(f'(?i)({"|".join(strings)})')
</code></pre>
<pre><code> itemid desc Matches
0 101 tea leaves [tea]
1 201 baseball gloves [baseball]
2 221 teas leaves from Onus Green Tea Co. [Onus, Tea]
</code></pre>