如何在dataframe中用多个列值替换一个列值

2024-06-26 17:48:24 发布

您现在位置:Python中文网/ 问答频道 /正文

输入数据帧

Date            Geo         Shipment
2020-01-01      USA         1000
2020-01-01      BRA         5865
2020-01-01      CHN         4789
2020-01-02      EU1         6541
2020-01-02      EU2         3258
..

dict =  {"EU1":["ALA", "BEL", "AND", "AUT"] , "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , "EU3": ["EST", "HRV", "FRA", "DEU"]}

如何替换列Geo中的值,以便存在一对多映射,并且Shipment值是重复的

输出测向

Date            Geo         Shipment
2020-01-01      ALA         1000
2020-01-01      BEL         1000
2020-01-01      AND         1000
2020-01-01      AUT         1000
..
2020-01-01      AUT         5865
2020-01-01      BEL         5865
2020-01-01      HRV         5865
2020-01-01      BGR         5865
2020-01-01      CZE         5865
..
2020-01-01      EST         4789
2020-01-01      HRV         4789
2020-01-01      FRA         4789
2020-01-01      DEU         4789
..

Tags: anddategeoestshipmentbelbgrala
1条回答
网友
1楼 · 发布于 2024-06-26 17:48:24

通过共构造函数使用DataFrame,然后通过^{}进行外部联接,并通过^{}重新分配列Geo

d =  {"EU1":["ALA", "BEL", "AND", "AUT"] , 
      "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , 
      "EU3": ["EST", "HRV", "FRA", "DEU"]}

df1 = pd.DataFrame(((k, x) for k, v in d.items() for x in v), columns=['Geo','New'])


df = df.merge(df1, on='Geo', how='outer')
df['Geo'] = df.pop('New')
df = df.sort_values('Date', ignore_index=True)

print (df)
          Date  Geo  Shipment
0   2020-01-01  ALA      1000
1   2020-01-01  HRV      4789
2   2020-01-01  EST      4789
3   2020-01-01  CZE      5865
4   2020-01-01  HRV      5865
5   2020-01-01  FRA      4789
6   2020-01-01  BEL      5865
7   2020-01-01  AUT      5865
8   2020-01-01  BGR      5865
9   2020-01-01  AUT      1000
10  2020-01-01  AND      1000
11  2020-01-01  BEL      1000
12  2020-01-01  DEU      4789
13  2020-01-02  AND      6541
14  2020-01-02  BEL      6541
15  2020-01-02  ALA      6541
16  2020-01-02  AUT      3258
17  2020-01-02  BEL      3258
18  2020-01-02  BGR      3258
19  2020-01-02  HRV      3258
20  2020-01-02  CZE      3258
21  2020-01-02  AUT      6541

具有新数据的解决方案:

d =  {"EU1":["ALA", "BEL", "AND", "AUT"] , 
      "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , 
      "EU3": ["EST", "HRV", "FRA", "DEU"]}

df1 = pd.DataFrame(((k, x) for k, v in d.items() for x in v), columns=['Geo','New'])

df = df.merge(df1, on='Geo', how='left')
df['Geo'] = df.pop('New').fillna(df['Geo'])
print (df)
          Date  Geo  Shipment
0   2020-01-01  USA      1000
1   2020-01-01  BRA      5865
2   2020-01-01  CHN      4789
3   2020-01-02  ALA      6541
4   2020-01-02  BEL      6541
5   2020-01-02  AND      6541
6   2020-01-02  AUT      6541
7   2020-01-02  AUT      3258
8   2020-01-02  BEL      3258
9   2020-01-02  BGR      3258
10  2020-01-02  HRV      3258
11  2020-01-02  CZE      3258

相关问题 更多 >