<p>受此启发,您可以采用类似的解决方案</p>
<h2>TL;博士</h2>
<pre><code>first_df[['last_name', 'start_name']] = first_df['Full Name'].str.split(' ', 1, expand=True)
second_df['last_name'] = second_df['Owner'].str.split(' ').str[-1]
df_final = first_df.merge(second_df, how='inner', left_on=['last_name'], right_on=['last_name'])
address_matches = df_final.apply(lambda x: True if difflib.get_close_matches(x['Address'], [x['Add Match']], n=1, cutoff=0.8) else False, axis=1)
df_final = df_final[address_matches].drop(columns=['last_name', 'start_name', 'Full Name', 'Address']).rename(columns={'Owner':'Name', 'Add Match': 'Address'})
</code></pre>
<h2>一步一步</h2>
<p>最初,提取所需的姓氏键</p>
<pre><code>first_df[['last_name', 'start_name']] = first_df['Full Name'].str.split(' ', 1, expand=True)
second_df['last_name'] = second_df['Owner'].str.split(' ').str[-1]
</code></pre>
<p><strong>PS:</strong>根据您的指示,我们使用pandas/numpy组合中的内置字符串方法。但是如果它更适合您,您也可以为地址部分应用下面所示的相似性方法(例如<code>difflib.get_close_matches</code>)</p>
<p>接下来,执行这些数据帧的内部联接以匹配<code>last_name</code>键</p>
<pre><code>df_temp = first_df.merge(second_df, how='inner', left_on=['last_name'], right_on=['last_name'])
</code></pre>
<p>然后应用具有所需相似性的<code>difflib.get_close_matches</code>(我使用了<code>cutoff=0.8</code>,因为在这个值之上没有返回值)方法来标记哪些行包含匹配项,然后只获得所需的行</p>
<pre><code>matches_mask = df_final.apply(lambda x: True if difflib.get_close_matches(x['Address'], [x['Add Match']], n=1, cutoff=0.8) else False, axis=1)
df_final = df_final[matches_mask].drop(columns=['last_name', 'start_name'])
</code></pre>
<pre><code>Full Name Address Owner Add Match
Mulligan Nick & Mary 270 Claude Road Brenda Joy Mulligan Claude Road
</code></pre>
<p>最后,为了与问题结尾处发布的结果的格式相匹配,您可以删除或重命名一些列</p>
<pre><code>df_final.drop(columns=['Full Name', 'Address']).rename(columns={'Owner':'Name', 'Add Match': 'Address'})
</code></pre>
<pre><code>Owner Add Match
Brenda Joy Mulligan Claude Road
</code></pre>