拆分(分解)数据帧字符串条目以分隔行。多列

2024-06-23 03:14:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据帧: Contains the first five rows of my dataframe

我需要替换“欧盟”并将其拆分(分解)为成员国,如以下示例所示: The dataframe should look like the following image

我试图用“欧盟”替换包含其成员的词典,然后用以下代码行将其拆分:

test_disc['countryname'] = test_disc['countryname'].replace({'European Union': 'Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland,Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands,Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden'})

test_disc[['iso_2', 'iso_3', 'countryname', 'país afetado','year',
       'SPS emergenciais', 'SPS regulares']].astype(str).apply(lambda x: 
       x.str.split(',').explode()).reset_index()

但是,我得到了以下错误: “ValueError:无法从重复轴重新编制索引”


Tags: 数据代码test示例成员isoreplace词典
1条回答
网友
1楼 · 发布于 2024-06-23 03:14:15

使用^{}时,应仅将目标列转换为列表内容,而不是所有列


演示数据

data = [{'iso_2': 0, 'iso_3': 'NaN', 'countryname': 'JP', 'país afetado': 'US', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 1, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'China', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 2, 'iso_3': 'NaN', 'countryname': 'US', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}, {'iso_2': 3, 'iso_3': 'NaN', 'countryname': 'European Union', 'país afetado': 'European Union', 'year': 2015, 'SPS emergenciais': 0, 'SPS regulares': 0}]
df = pd.DataFrame(data)
df

       iso_2 iso_3     countryname    país afetado  year  SPS emergenciais  \
    0      0   NaN              JP              US  2015                 0   
    1      1   NaN  European Union           China  2015                 0   
    2      2   NaN              US  European Union  2015                 0   
    3      3   NaN  European Union  European Union  2015                 0   

       SPS regulares  
    0              0  
    1              0  
    2              0  
    3              0  

过程:

for col in ['país afetado', 'countryname']:
    df[col] = df[col].replace({'European Union': 'Austria, Belgium, Netherlands,Poland'})
    df[col] = df[col].str.split(',\s*')

df_result = df.explode('countryname').explode('país afetado')

结果:

   iso_2 iso_3  countryname país afetado  year  SPS emergenciais  
0      0   NaN           JP           US  2015                 0   
1      1   NaN      Austria        China  2015                 0   
1      1   NaN      Belgium        China  2015                 0   
1      1   NaN  Netherlands        China  2015                 0   
1      1   NaN       Poland        China  2015                 0   
2      2   NaN           US      Austria  2015                 0   
2      2   NaN           US      Belgium  2015                 0   
2      2   NaN           US  Netherlands  2015                 0   
2      2   NaN           US       Poland  2015                 0   
3      3   NaN      Austria      Austria  2015                 0   
3      3   NaN      Austria      Belgium  2015                 0   
3      3   NaN      Austria  Netherlands  2015                 0   
3      3   NaN      Austria       Poland  2015                 0   
3      3   NaN      Belgium      Austria  2015                 0   
3      3   NaN      Belgium      Belgium  2015                 0   
3      3   NaN      Belgium  Netherlands  2015                 0   
3      3   NaN      Belgium       Poland  2015                 0   
3      3   NaN  Netherlands      Austria  2015                 0   
3      3   NaN  Netherlands      Belgium  2015                 0   
3      3   NaN  Netherlands  Netherlands  2015                 0   
3      3   NaN  Netherlands       Poland  2015                 0   
3      3   NaN       Poland      Austria  2015                 0   
3      3   NaN       Poland      Belgium  2015                 0   
3      3   NaN       Poland  Netherlands  2015                 0   
3      3   NaN       Poland       Poland  2015                 0  

相关问题 更多 >

    热门问题