基于Pandas参考框架的di

2024-06-14 08:07:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要根据引用dict重命名并重复我的dataframe列。下面我创建了一个虚拟的dataframe:

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')

        entity  entity2  entity3
id                              
json   present  present   absent
molly   absent  present   absent
tina    absent  present   absent
jake   present   absent  present
molly  present   absent   absent

现在我有了以下dict示例:

^{pr2}$

我现在需要替换基于dict值的列名,如果一个列有多个值,则应该重复该列。以下是我想要的数据帧:

       entity_exp1  entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                      
json    present      present      present      absent      absent    absent
molly   absent       present      present      absent      absent    absent
tina    absent       present      present      absent      absent    absent
jake    present      absent       absent       present     present   present
molly   present      absent       absent       absent      absent    absent

Tags: idjsondataframedfdictmollyentityabsent
3条回答

您可以简单地循环:

rawdata= {'id':['json','molly','tina','jake','molly'],
          'entity':['present','absent','absent','present','present'],
          'entity2':['present','present','present','absent','absent'],
          'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
ref_dict= {'entity':['entity_exp1'],
           'entity2':['entity2_exp1','entity2_exp2'],
           'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

# here comes the new part:
df2 = pd.DataFrame()
for key, val in sorted(ref_dict.items()):
    for subval in val:
        df2[subval] = df[key]

df2['id'] = df['id']
df2.set_index('id', inplace=True)

print(df2)
      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2  entity3_exp3  
id                                                                      
json      present      present      present       absent       absent        absent   
molly      absent      present      present       absent       absent        absent   
tina       absent      present      present       absent       absent        absent   
jake      present       absent       absent      present      present       present    
molly     present       absent       absent       absent       absent        absent   

选项1
在字典理解中使用pd.concat

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)

      entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id                                                                                
json       present      present       absent       absent       absent     present
molly      present      present       absent       absent       absent      absent
tina       present      present       absent       absent       absent      absent
jake        absent       absent      present      present      present     present
molly       absent       absent       absent       absent       absent     present

选项2
切片数据帧并重命名列

^{pr2}$

您可以使用dict键作为列名重新编制df的索引,然后使用dict的值重命名列

df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[]))
df_new.columns=sum(ref_dict.values(),[])
df_new
Out[573]: 
  entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
0     present      present      present       absent       absent       absent
1      absent      present      present       absent       absent       absent
2      absent      present      present       absent       absent       absent
3     present       absent       absent      present      present      present
4     present       absent       absent       absent       absent       absent

相关问题 更多 >