如何合并公共列上的多个csv文件,并将非公共列保留为单独的列?

2024-09-29 03:37:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有三个csv文件,其中包含关于新冠病毒19的数据。第一个csv有关于confirmed cases数量的信息,第二个有关于number of deaths的信息,第三个有关于number of recovery的信息

这就是数据帧的样子

import pandas as pd

df1 = pd.read_csv('/Users/sr/covid_csvs/confirmed.csv')

df2 = pd.read_csv('/Users/sr/covid_csvs/deaths.csv')

df3 = pd.read_csv('/Users/sr/covid_csvs/recovery.csv')

print(df1.head(5))

  Province/State Country/Region      Lat     Long     Date  Confirmed
0            NaN    Afghanistan  33.0000  65.0000  1/22/20          0
1            NaN        Albania  41.1533  20.1683  1/22/20          0
2            NaN        Algeria  28.0339   1.6596  1/22/20          0
3            NaN        Andorra  42.5063   1.5218  1/22/20          0
4            NaN         Angola -11.2027  17.8739  1/22/20          0


print(df2.head(5))

  Province/State Country/Region      Lat     Long     Date     Deaths
0            NaN    Afghanistan  33.0000  65.0000  1/22/20          0
1            NaN        Albania  41.1533  20.1683  1/22/20          0
2            NaN        Algeria  28.0339   1.6596  1/22/20          0
3            NaN        Andorra  42.5063   1.5218  1/22/20          0
4            NaN         Angola -11.2027  17.8739  1/22/20          0


print(df3.head(5))

  Province/State Country/Region      Lat     Long     Date  Recovery
0            NaN    Afghanistan  33.0000  65.0000  1/22/20         0
1            NaN        Albania  41.1533  20.1683  1/22/20         0
2            NaN        Algeria  28.0339   1.6596  1/22/20         0
3            NaN        Andorra  42.5063   1.5218  1/22/20         0
4            NaN         Angola -11.2027  17.8739  1/22/20         0

现在我想合并所有三个数据帧,以便得到以下结果

  Province/State Country/Region      Lat     Long     Date  Confirmed  Deaths Recovery
0            NaN    Afghanistan  33.0000  65.0000  1/22/20          0       0        0
1            NaN        Albania  41.1533  20.1683  1/22/20          0       0        0
2            NaN        Algeria  28.0339   1.6596  1/22/20          0       0        0
3            NaN        Andorra  42.5063   1.5218  1/22/20          0       0        0
4            NaN         Angola -11.2027  17.8739  1/22/20          0       0        0

所以我试着做下面的事情

df_merged = pd.concat([df1, df2, df3])    
df_merged.to_csv('merged.csv', sep=',', encoding='utf-8', index=False)

但是我没有得到所需的csv。我该怎么做


Tags: csvdatenancountryregionlongpdstate
1条回答
网友
1楼 · 发布于 2024-09-29 03:37:15

想法是通过{a1}为每个{}创建{},然后用{}创建{a2},最后删除{}中的{}:

cols = ['Province/State', 'Country/Region','Lat','Long','Date']

dfs = [df1, df2, df3]
df_merged = pd.concat([x.set_index(cols) for x in dfs], axis=1)    
df_merged.to_csv('merged.csv', sep=',', encoding='utf-8')

或者将MultiIndex转换为列,然后在to_csv中使用index=False

cols = ['Province/State', 'Country/Region','Lat','Long','Date']

dfs = [df1, df2, df3]
df_merged = pd.concat([x.set_index(cols) for x in dfs], axis=1).reset_index()  
df_merged.to_csv('merged.csv', sep=',', encoding='utf-8', index=False)

相关问题 更多 >