<h2>方法1</h2>
<p>如果您不介意将键更改为浮动,第一种方法是使用<code>cumcount</code>递增</p>
<pre><code>df3 = pd.concat([df,df2])
s = df3.groupby('Carname',sort=False)['Carid'].first().to_frame()
s['Carid'] = s['Carid'] + s.groupby('Carid').cumcount() / 10
new_ids = s.to_dict(orient='dict')['Carid']
df3['Carid'] = df3['Carname'].map(new_ids)
Carid Carname model
0 1.0 Mercedes-Benz S-Klasse AMG 63s
1 2.0 Audi S6
2 3.0 BMW X6 M-Power
3 1.0 Mercedes-Benz Maybach
0 4.0 VW GTI
1 1.1 Citroen S
2 5.0 Opel Corsa
</code></pre>
<h2>方法2使用字典的功能性方法</h2>
<h2>假设</h2>
<p>函数的逻辑是基于每个数据帧有一个唯一的<code>carid</code>来预测的</p>
<p>您的ID是按顺序排列的,因此使用<code>max</code>{<cd2>}生成数字最有意义。如果您有一个carid <code>[1,2,3,200]</code>列表,这可能会生成非序列号</p>
<p>这将为雪铁龙生成一个新的<code>Carid</code>{<cd7>}唯一的<code>201</code>,因为<code>200</code>的ID已经存在,并且由汽车制造商拥有</p>
<h2>作用</h2>
<pre><code>import pandas as pd
import numpy as np
from collections import ChainMap
def generate_new_keys(*args,key='Carid',name='Carname'):
"""
Takes in a number of dataframes and returns any duplicates with a new unique id.
groupby columns fixed to CarID and CarName.
"""
# adds dictionaries into a single list.
dicts_ = [arg.groupby(key)[name].first().to_dict() for arg in args]
#merges dicts on unique key, this will exclude duplicates.
merged_dicts = dict(ChainMap(*dicts_))
#get the duplicate and pass the name into a list.
delta = [v for each_dict in dicts_ for k,v in each_dict.items() if v not in merged_dicts.values()]
# get the max sequence key
start_key = max(merged_dicts.keys()) + 1
# create a new sequence
sequence = range(start_key, start_key + len(delta) + 1)
# return a dictionary.
return {name : number for name,number in zip(delta,sequence)}
</code></pre>
<h2>行动中</h2>
<pre><code>new_keys = generate_new_keys(df,df2)
print(new_keys)
{'Citroen': 6}
df3 = pd.concat([df,df2])
df3['Carid'] = np.where(df3['Carname'].isin(new_keys.keys()),
df3['Carname'].map(new_keys), df3['Carid'])
print(df3)
Carid Carname model
0 1.0 Mercedes-Benz S-Klasse AMG 63s
1 2.0 Audi S6
2 3.0 BMW X6 M-Power
0 4.0 VW GTI
1 6.0 Citroen S
2 5.0 Opel Corsa
</code></pre>
<h2>测试额外的数据帧</h2>
<pre><code>new_df = pd.DataFrame({'Carid' : [1,2,3],
'Carname' : ['Mercedes-Benz', 'Toyota','BMW'] })
new_keys = generate_new_keys(df,df2,new_df)
{'Citroen': 6, 'Toyota': 7}
df3 = pd.concat([df1,df2,new_df])
df3['Carid'] = np.where(df3['Carname'].isin(new_keys.keys()),
df3['Carname'].map(new_keys), df3['Carid'])
print(df3)
Carid Carname model
0 1.0 Mercedes-Benz S-Klasse AMG 63s
1 2.0 Audi S6
2 3.0 BMW X6 M-Power
0 4.0 VW GTI
1 6.0 Citroen S #< new id
2 5.0 Opel Corsa
0 1.0 Mercedes-Benz NaN
1 7.0 Toyota NaN #< new id
2 3.0 BMW NaN
</code></pre>