数据帧中一个热编码列的统计信息

import pandas as pd dictt = { "label": ["cat", "cat", "cat", "cat", "cat", "dog", "dog", "dog"], "featureA_1": [1, 0, 1, 1, 0, 1, 1, 0], "featureA_2": [0, 1, 0, 0, 0, 0, 0, 0], "featureA_3": [0, 0, 0, 0, 1, 0, 0, 1], "featureB_1": [0, 0, 1, 1, 0, 0, 1, 1], "featureB_2": [1, 1, 0, 0, 1, 1, 0, 0], } df1 = pd.DataFrame(dictt)

1条回答

网友

1楼 · 发布于 2024-09-28 19:25:30

使用：

#aggregate mean for percentages of 1, because only 0, 1 values 
df = df1.groupby('label').mean().add_suffix('_perc').round(2)

#aggregate std witg ddof=0, because default pandas ddof=1
df2 = df.groupby(lambda x: x.split('_')[0], axis=1).std(ddof=0).add_suffix('_std').round(2)

#join together
df = pd.concat([df, df2],axis=1).sort_index(axis=1).reset_index()
print (df)
  label  featureA_1_perc  featureA_2_perc  featureA_3_perc  featureA_std  \
0   cat             0.60              0.2             0.20          0.19   
1   dog             0.67              0.0             0.33          0.27   

   featureB_1_perc  featureB_2_perc  featureB_std  
0             0.40             0.60          0.10  
1             0.67             0.33          0.17

相关问题更多 >

编程相关推荐

热门问题

热门文章

数据帧中一个热编码列的统计信息

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >