<p>我的票到了。请注意,如果可以用新的i=unique id替换df和df2中的Carid,则会容易得多。但继续回答这个问题,我们开始吧</p>
<p>首先,我们为第一个df在carname和carid之间创建一个映射<code>cm</code></p>
<pre><code>d = {'Carid': [1, 2, 3, 1], 'Carname': ['Mercedes-Benz', 'Audi', 'BMW', 'Mercedes-Benz'], 'model': ['S-Klasse AMG 63s', 'S6', 'X6 M-Power', 'Maybach']}
df = pd.DataFrame(data=d)
display(df.head())
cm = {name : id for name, id in zip(df['Carname'], df['Carid'])}
cm
</code></pre>
<p>然后,我们对第二个df执行相同的操作</p>
<pre><code>d2 = {'Carid': [4, 1, 5], 'Carname': ['VW', 'Citroen', 'Opel'], 'model': ['GTI', 'S', 'Corsa']}
df2 = pd.DataFrame(data=d2)
display(df2.head())
cm2= {name : id for name, id in zip(df2['Carname'], df2['Carid'])}
cm2
</code></pre>
<p>然后,主要的动作是,组合两个映射,保留原始ID,除非发生冲突,在这种情况下,我们分配唯一ID</p>
<pre><code>unique_id = max(list(cm.values()) + list(cm2.values()))+1
for new_name in df2['Carname']:
if new_name in cm:
# already included
pass
elif cm2[new_name] not in cm.values():
# unique carid
cm[new_name] = cm2[new_name]
else:
# the new_name is not in cm but its id is
cm[new_name] = unique_id
unique_id += 1
print(cm)
</code></pre>
<p>现在,cm每个肉身都有唯一的id,保留最初使用的id,除非它们发生冲突:</p>
<pre><code>{'Mercedes-Benz': 1, 'Audi': 2, 'BMW': 3, 'VW': 4, 'Citroen': 6, 'Opel': 5}
</code></pre>
<p>现在重新映射ID</p>
<pre><code>df['Carid'] = df['Carname'].replace(cm)
df2['Carid'] = df2['Carname'].replace(cm)
</code></pre>
<p>最后将它们结合在一起</p>
<pre><code>dfs = []
dfs.append(df)
dfs.append(df2)
pd.concat(dfs)
</code></pre>
<p>结果是</p>
<pre><code>| | Carid | Carname | model |
| -:| :|: |: -|
| 0 | 1 | Mercedes-Benz | S-Klasse AMG 63s |
| 1 | 2 | Audi | S6 |
| 2 | 3 | BMW | X6 M-Power |
| 3 | 1 | Mercedes-Benz | Maybach |
| 0 | 4 | VW | GTI |
| 1 | 6 | Citroen | S |
| 2 | 5 | Opel | Corsa |
</code></pre>