如何按数据帧中的值分组和计数?

2024-10-17 08:22:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个数据帧:

df = pd.DataFrame(columns=["App","Feature1", "Feature2","Feature3",
                           "Feature4","Feature5",
                           "Feature6","Feature7","Feature8"], 
                  data=[["SHA",0,0,1,1,1,0,1,0],
                        ["LHA",1,0,1,1,0,1,1,0],
                        ["DRA",0,0,0,0,0,0,1,0],
                        ["FRA",1,0,1,1,1,0,1,1],
                        ["BRU",0,0,1,0,1,0,0,0],
                        ["PAR",0,1,1,1,1,0,1,0],
                        ["AER",0,0,1,1,0,1,1,0],
                        ["SHE",0,0,0,1,0,0,1,0]])

更新:(对不起,我没有正确表述预期结果)

我想计算每个功能的值1出现的时间:

Features   Count
Feature1   6
Feature2   7
...

我试过这个:

df.groupBy("App").count()

但我没有得到预期的输出。你知道吗


Tags: columns数据appdataframedfdatapdfeature1
2条回答

另一种使用熔化的方法:

首先获取长格式数据:

df_melt=pd.melt(df, id_vars='App', value_vars=['Feature%d'%(i) for i in range(1,9)], var_name='Features', value_name='value')

然后按Features分组并计算1:

df_melt.groupby('Features').sum().reset_index().rename(columns={'value':'count'})

用途:

#remove column App, compare and get sum of Trues
a0 = df.drop('App', 1).eq(0).sum()
#a0 = df.set_index('App').eq(0).sum()

#alternative with select only Feature columns
#a0 = df.filter(like='Feature').eq(0).sum()

#alternative with select all columns without first
a0 = df.iloc[:, 1:].eq(0).sum()

print (a0)
Feature1    6
Feature2    7
Feature3    2
Feature4    2
Feature5    4
Feature6    6
Feature7    1
Feature8    7
dtype: int64

1类似:

a1 = df.drop('App', 1).eq(1).sum()
#a1 = df.set_index('App').eq(1).sum()

#alternative
#a1 = df.filter(like='Feature').eq(1).sum()
#alternative
a1 = df.iloc[:, 1:].eq(1).sum()

print (a1)
Feature1    2
Feature2    1
Feature3    6
Feature4    6
Feature5    4
Feature6    2
Feature7    7
Feature8    1
dtype: int64

加上^{}

a = df.drop('App', 1).apply(pd.value_counts).T.add_prefix('count_')
print (a)
          count_0  count_1
Feature1        6        2
Feature2        7        1
Feature3        2        6
Feature4        2        6
Feature5        4        4
Feature6        6        2
Feature7        1        7
Feature8        7        1

或与列表理解:

L = [df[x].value_counts() for x in df.columns.difference(['App'])]
a = pd.concat(L, 1).T.add_prefix('count_')
print (a)
          count_0  count_1
Feature1        6        2
Feature2        7        1
Feature3        2        6
Feature4        2        6
Feature5        4        4
Feature6        6        2
Feature7        1        7
Feature8        7        1

相关问题 更多 >