<p>这类单词中的字母组合可能会出现大量的可能性。您试图做的是两个字符串之间的模糊匹配。我建议使用以下方法:</p>
<pre><code>#!pip install fuzzywuzzy
from fuzzywuzzy import fuzz, process
word = 'sender'
others = ['bnder', 'snder', 'sender', 'hello']
process.extractBests(word, others)
</code></pre>
<pre><code>[('sender', 100), ('snder', 91), ('bnder', 73), ('hello', 18)]
</code></pre>
<p>基于此,您可以决定选择哪个阈值,然后将高于阈值的阈值标记为匹配(使用上面使用的代码)</p>
<p>这里有一个方法可以在你的问题陈述中用一个函数做到这一点-</p>
<pre><code>df = pd.DataFrame(['hi there i am a sender',
'I dont wanna be a bnder',
'can i be the snder?',
'i think i am a nerd'], columns=['text'])
#s = sentence, w = match word, t = match threshold
def get_match(s,w,t):
ss = process.extractBests(w,s.split())
return any([i[1]>t for i in ss])
#What its doing - Match each word in each row in df.text with
#the word sender and see of any of the words have a match greater
#than threshold ratio 70.
df['match'] = df['text'].apply(get_match, w='sender', t=70)
print(df)
</code></pre>
<pre><code> text match
0 hi there i am a sender True
1 I dont wanna be a bnder True
2 can i be the snder? True
3 i think i am a nerd False
</code></pre>
<p>t如果想要更精确的匹配,请将t值从70调整到80;如果想要更轻松的匹配,请将t值从70调整到80</p>
<p>最后你可以过滤掉-</p>
<pre><code>df[df['match']==True][['text']]
</code></pre>
<pre><code> text
0 hi there i am a sender
1 I dont wanna be a bnder
2 can i be the snder?
</code></pre>