向datafram添加特定于类别的列和值

wk start car rims color Autopilot$ Sunroof$ 0 2018-09-09 tesla model x 17 black 3000 0 1 2018-09-16 tesla model x 14 yellow 3000 0 2 2018-09-23 tesla model x 13 white 3000 0 3 2018-09-09 tesla model 3 19 grey 0 2000 4 2018-09-16 tesla model 3 21 pink 0 2000

import pandas as pd df = pd.DataFrame({'wk start': ['2018-09-09', '2018-09-16', '2018-09-23','2018-09-09', '2018-09-16'], 'car': [ 'tesla model x', 'tesla model x', 'tesla model x','tesla model 3','tesla model 3'], 'rims': [17,14,13,19,21], 'color':['black','yellow','white','grey','pink'], 'Autopilot$':[3000,3000, 3000,0,0], 'Sunroof$':[0,0,0,2000,2000]}) model3 = df[df['car']=='tesla model 3'] modelx = df[df['car']=='tesla model x'] example = model3.merge(modelx, how='outer',left_on='wk start',right_on='wk start',suffixes=('_model3', '_modelx')) del example['car_model3'] del example['car_modelx'] example['AUTOPILOT']=example['Autopilot$_model3']+example['Autopilot$_modelx'] example['SUNROOF']=example['Sunroof$_model3']+example['Sunroof$_modelx'] del example['Autopilot$_model3'] del example['Autopilot$_modelx'] del example['Sunroof$_modelx'] del example['Sunroof$_model3']

1条回答

网友

1楼 · 发布于 2024-09-29 00:36:38

使用：

df = df.set_index(['wk start','car']).unstack()
df.columns = df.columns.map('_'.join)

df = df.reset_index()

df = df.loc[:, df.fillna(0).ne(0).any()]
print (df)
     wk start  rims_tesla model 3  rims_tesla model x color_tesla model 3  \
0  2018-09-09                19.0                17.0                grey   
1  2018-09-16                21.0                14.0                pink   
2  2018-09-23                 NaN                13.0                 NaN   

  color_tesla model x  Autopilot$_tesla model x  Sunroof$_tesla model 3  
0               black                    3000.0                  2000.0  
1              yellow                    3000.0                  2000.0  
2               white                    3000.0                     NaN

说明：

按^{}用^{}重塑形状
按map和join展平列中的多索引
按^{}对列的索引
最后一次只删除0列，由^{}使用loc删除

编辑：

can you explain this line a bit df.loc[:, df.fillna(0).ne(0).any()] ? I can't figure out what it does? There aren't any nan values.

如果使用unstack，则可能缺少一些值，如本示例中所示：

^{pr2}$

因此需要为不包含全部零或全部零的列返回真值（使用fillna(0)的原因是什么）：

print (df.fillna(0).ne(0))
   wk start  rims_tesla model 3  rims_tesla model x  color_tesla model 3  \
0      True                True                True                 True   
1      True                True                True                 True   
2      True               False                True                False   

   color_tesla model x  Autopilot$_tesla model 3  Autopilot$_tesla model x  \
0                 True                     False                      True   
1                 True                     False                      True   
2                 True                     False                      True   

   Sunroof$_tesla model 3  Sunroof$_tesla model x  
0                    True                   False  
1                    True                   False  
2                   False                   False

用^{}检查是否至少有一个为真：

print (df.fillna(0).ne(0).any())
wk start                     True
rims_tesla model 3           True
rims_tesla model x           True
color_tesla model 3          True
color_tesla model x          True
Autopilot$_tesla model 3    False
Autopilot$_tesla model x     True
Sunroof$_tesla model 3       True
Sunroof$_tesla model x      False
dtype: bool

相关问题更多 >

编程相关推荐

热门问题

热门文章