<p>试试看-</p>
<pre><code>import pandas as pd
df1 = pd.read_csv('sample.csv')
df2 = pd.read_csv('sample_2.csv')
df2['values']= df2['values'].str.lower()
df1['names']= df1['names'].str.lower()
df2["values"] = df2['values'].str.replace('[^\w\s]',' ')
df2['values']= df2['values'].replace('\s+', ' ', regex=True)
df1["names"] = df1['names'].str.replace('[^\w\s]',' ')
df1['names']= df1['names'].replace('\s+', ' ', regex=True)
df2['list_values'] = df2['values'].apply(lambda x: str(x).split())
df1['list_names'] = df1['names'].apply(lambda x: str(x).split())
list_names = df1['list_names'].tolist()
def check_names(x, list_names):
output = ''
for list_name in list_names:
if set(list_name) >= set(x):
output = ' '.join(list_name)
break
return output
df2['Names'] = df2['list_values'].apply(lambda x: check_names(x, list_names))
print(df2)
</code></pre>
<p><strong>输出</strong></p>
<pre><code>values Names
0 sri sri is a good player
1 NaN
2 sri is sri is a good player
3 kumar cricketer player kumar is a cricketer player
</code></pre>
<p><strong>检查</p>
<p>这是一个模糊匹配问题。以下是我应用的步骤-</p>
<ol>
<li>删除标点并拆分以获得两个<code>df</code>上的唯一单词</li>
<li>小写所有的标准化匹配。你知道吗</li>
<li>通过将字符串拆分为列表进行转换。你知道吗</li>
<li>最后通过<code>check_names()</code>函数进行匹配以获得所需的输出</li>
</ol>