Pandas透视多个分类列

2024-10-01 02:34:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个这样的数据帧:

name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']

test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})

    actfig       car                name    pet
0   superman     lamborghini        fred    cat
1   batman       ferrari            fred    dog
2   flash        bugatti            fred    bird
3   greenlantern ferrari            james   cat
4   flash        corvette           james   dog
5   batman       bugatti            rick    dog
6   joker        bmw                rick    fish
7   superman     bmw                jeff    marmet

如果我的术语不正确,请原谅,但是我想转换数据,以便在['actionfigures'、'car'、'pet']列中为每个名称获取每个值的计数。在

^{pr2}$

我本以为test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])就能做到,但它给了我一些奇怪的多级列。在

我想也许我可以为每一列合并get_dummies,然后按名称和总和分组,但感觉pandas prob有更好的方法。在

怎么做呢?在


Tags: namefredcarcatflashrickpetdog
2条回答

melt和{}

test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]: 
value  batman  bird  bmw  bugatti  cat  corvette  dog  ferrari  fish  flash  \
name                                                                          
fred      1.0   1.0  0.0      1.0  1.0       0.0  1.0      1.0   0.0    1.0   
james     0.0   0.0  0.0      0.0  1.0       1.0  1.0      1.0   0.0    1.0   
jeff      0.0   0.0  1.0      0.0  0.0       0.0  0.0      0.0   0.0    0.0   
rick      1.0   0.0  1.0      1.0  0.0       0.0  1.0      0.0   1.0    0.0   
value  greenlantern  joker  lamborghini  marmet  superman  
name                                                       
fred            0.0    0.0          1.0     0.0       1.0  
james           1.0    0.0          0.0     0.0       0.0  
jeff            0.0    0.0          0.0     1.0       1.0  
rick            0.0    1.0          0.0     0.0       0.0  

get_dummies

^{pr2}$

编辑:根据PiR

pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1]) 

选项1
pd.get_dummies按部件

a = pd.get_dummies(test.actfig)
c = pd.get_dummies(test.car)
p = pd.get_dummies(test.pet)
n = pd.get_dummies(test.name).T

pd.concat([n.dot(d) for d in [a, c, p]], axis=1)

       batman  flash  greenlantern  joker  superman  bmw  bugatti  corvette  ferrari  lamborghini  bird  cat  dog  fish  marmet
fred        1      1             0      0         1    0        1         0        1            1     1    1    1     0       0
james       0      1             1      0         0    0        0         1        1            0     0    1    1     0       0
jeff        0      0             0      0         1    1        0         0        0            0     0    0    0     0       1
rick        1      0             0      1         0    1        1         0        0            0     0    0    1     1       0

选项2
stack+pd.crosstab

^{pr2}$

相关问题 更多 >