如何在Python中匹配来自不同数据帧的相同列的字段?

2024-09-27 20:17:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要匹配来自两个独立数据帧的两列的相同字段,并重写原始数据帧,考虑到另一个数据帧

我有一个原始的df:

   Original Car Brand  Original City
0             Daimler        Chicago
1          Mitsubishi             LA
2               Tesla         Vienna
3              Toyota         Zurich
4             Renault         Sydney
5                Ford        Toronto
6                 BMW        Hamburg
7          Audi Sport       Helsinki
8             Citroen         Dublin
9           Chevrolet       Brisbane
10               Fiat  San Francisco
11               Audi  New York City
12            Ferrari           Oslo
13         Volkswagen      Stockholm
14        Lamborghini      Singapore
15           Mercedes         Lisbon
16             Jaguar         Boston

这个新的df:

     Car Brand Current City
0        Tesla    Amsterdam
1      Renault        Paris
2          BMW       Munich
3         Fiat      Detroit
4         Audi       Berlin
5      Ferrari    Bruxelles
6  Lamborghini         Rome
7     Mercedes       Madrid

我需要匹配上述两个数据框中相同的汽车品牌,并在原始df中写入新的关联城市,因此结果应该是这个:(例如,特斯拉现在是阿姆斯特丹而不是维也纳)

   Original Car Brand Original City
0             Daimler       Chicago
1          Mitsubishi            LA
2               Tesla     Amsterdam
3              Toyota        Zurich
4             Renault         Paris
5                Ford       Toronto
6                 BMW        Munich
7          Audi Sport      Helsinki
8             Citroen        Dublin
9           Chevrolet      Brisbane
10               Fiat       Detroit
11               Audi        Berlin
12            Ferrari     Bruxelles
13         Volkswagen     Stockholm
14        Lamborghini          Rome
15           Mercedes        Madrid
16             Jaguar        Boston

我尝试用这段代码映射列并重写字段,但它实际上不起作用,我也不知道如何使它起作用:

original_df['Original City'] = original_df['Car Brand'].map(dict(corrected_df[['Car Brand', 'Current City']]))

如何让它工作?非常感谢

备注:df的代码:

cars =            ['Daimler', 'Mitsubishi','Tesla', 'Toyota', 'Renault', 'Ford','BMW', 'Audi Sport','Citroen', 'Chevrolet', 'Fiat', 'Audi', 'Ferrari', 'Volkswagen','Lamborghini', 'Mercedes', 'Jaguar']
cities =          ['Chicago', 'LA', 'Vienna', 'Zurich', 'Sydney', 'Toronto', 'Hamburg', 'Helsinki', 'Dublin', 'Brisbane', 'San Francisco', 'New York City', 'Oslo', 'Stockholm', 'Singapore', 'Lisbon', 'Boston']
data = {'Original Car Brand': cars, 'Original City': cities}
original_df = pd.DataFrame (data, columns = ['Original Car Brand', 'Original City'])

---

cars =            ['Tesla', 'Renault', 'BMW', 'Fiat', 'Audi', 'Ferrari', 'Lamborghini', 'Mercedes']
cities = ['Amsterdam', 'Paris', 'Munich', 'Detroit', 'Berlin', 'Bruxelles', 'Rome', 'Madrid']
data = {'Car Brand': cars, 'Current City': cities}
corrected_df = pd.DataFrame (data, columns = ['Car Brand', 'Current City'])

Tags: citydfcurrentcarcarsmercedesfiatoriginal
3条回答

使用^{}^{}原始列不匹配的repalce值:

s = corrected_df.set_index('Car Brand')['Current City']

original_df['Original City'] = (original_df['Original Car Brand'].map(s)
                                        .fillna(original_df['Original City']))
print (original_df)
   Original Car Brand Original City
0             Daimler       Chicago
1          Mitsubishi            LA
2               Tesla     Amsterdam
3              Toyota        Zurich
4             Renault         Paris
5                Ford       Toronto
6                 BMW        Munich
7          Audi Sport      Helsinki
8             Citroen        Dublin
9           Chevrolet      Brisbane
10               Fiat       Detroit
11               Audi        Berlin
12            Ferrari     Bruxelles
13         Volkswagen     Stockholm
14        Lamborghini          Rome
15           Mercedes        Madrid
16             Jaguar        Boston

您的解决方案应在dict之前将两列转换为numpy数组:

d = dict(corrected_df[['Car Brand','Current City']].to_numpy())
original_df['Original City'] = (original_df['Original Car Brand'].map(d)
                                      .fillna(original_df['Original City']))

您可以使用set_index()assign()方法:

resultdf=original_df.set_index('Original Car Brand').assign(OriginalCity=corrected_df.set_index('Car Brand'))

最后使用fillna()方法和reset_index()方法:

resultdf=resultdf['OriginalCity'].fillna(resultdf['Original City']).reset_index()

让我们试试update

df1 = df1.set_index('Original Car Brand')
df1.update(df2.set_index('Car Brand'))
df1 = df1.reset_index()

相关问题 更多 >

    热门问题