擅长:python、mysql、java
<p>一种方法是使用<code>difflib</code>获取最接近的值和<code>lambda</code>:</p>
<p>首先创建映射器:</p>
<pre><code>from difflib import get_close_matches
mapper = {val:k for k, v in catg_df.to_dict('list').items() for val in v}
print(mapper)
</code></pre>
<p>所以,mapper应该是:</p>
<pre><code>{'Windscreen': 'Glass',
'demister': 'Glass',
'engine': 'underhood',
'engine cover': 'bodywork',
'oil filter': 'underhood',
'rear panel': 'bodywork',
'side panel': 'bodywork',
'spark plug': 'underhood',
'window': 'Glass'}
</code></pre>
<p>现在,使用<code>lambda</code>和<code>difflib</code>来查找最接近的值:</p>
<pre><code># avoid calling mapper.keys() in lambda
keys = mapper.keys()
desc_df['Category'] = desc_df['col1'].apply(lambda row: mapper[get_close_matches(row, keys)[0]])
</code></pre>
<p>结果:</p>
<pre><code> col1 Category
0 engine underhood
1 blue engine cover bodywork
2 spark plug underhood
3 rear panel bodywork
4 black rear panel bodywork
5 blue engine underhood
</code></pre>