使用transform在数据帧中计算特定值并聚合结果

Errorid Matricule Priority 0 1 01 P1 1 2 01 P2 2 3 01 NC 3 4 02 P1 4 5 02 P4 5 6 02 EDC 6 7 02 P2

Errorid Matricule Priority NberrorsMatricule 0 1 01 P1 2 1 2 01 P2 2 2 3 01 NC 2 3 4 02 P1 3 4 5 02 P4 3 5 6 02 EDC 3 6 7 02 P2 3

DF['NberrorsMatricule'] = DF.groupby('Matricule')['Pirority'].transform(lambda x : x.count() if x in ['P1','P2','P3','P4']) DF['NberrorsMatricule'] = DF.groupby('Matricule')[DF['Pirority'] in ['P1','P2','P3','P4']].transform("count")

2条回答

网友

1楼 · 编辑于 2024-05-19 12:52:35

像这样：

In [567]:  df['NberrorsMatricule'] = df[~df.Priority.isin(['NC', 'EDC'])].\ 
     ...:                               groupby('Matricule')['Errorid']\ 
     ...:                               .transform('count')

要删除Nan，请使用ffill():

In [595]: df['NberrorsMatricule'] = df['NberrorsMatricule'].ffill()                                                                                                                                         

In [596]: df                                                                                                                                                                                                
Out[596]: 
   Errorid  Matricule Priority  NberrorsMatricule
0        1          1       P1                2.0
1        2          1       P2                2.0
2        3          1       NC                2.0
3        4          2       P1                3.0
4        5          2       P4                3.0
5        6          2      EDC                3.0
6        7          2       P2                3.0

网友

2楼 · 编辑于 2024-05-19 12:52:35

您可以使用^{}和^{}将不匹配的值替换为缺少的值，因此，如果使用^{}和^{}，则会排除缺少的值：

L = ['P1','P2','P3','P4']
df['NberrorsMatricule'] = (df['Priority'].where(df['Priority'].isin(L))
                                         .groupby(df['Matricule'])
                                         .transform('count'))
print (df)
   Errorid  Matricule Priority  NberrorsMatricule
0        1          1       P1                  2
1        2          1       P2                  2
2        3          1       NC                  2
3        4          2       P1                  3
4        5          2       P4                  3
5        6          2      EDC                  3
6        7          2       P2                  3

详细信息：

print (df['Priority'].where(df['Priority'].isin(L)))
0     P1
1     P2
2    NaN
3     P1
4     P4
5    NaN
6     P2
Name: Priority, dtype: object

另一种解决方案是通过sum对匹配值进行计数，以便将True和False转换为1, 0可以使用^{}或^{}：

df['NberrorsMatricule'] = (df['Priority'].isin(L)
                                         .view('i1')
                                         .groupby(df['Matricule'])
                                         .transform('sum'))
print (df)

   Errorid  Matricule Priority  NberrorsMatricule
0        1          1       P1                  2
1        2          1       P2                  2
2        3          1       NC                  2
3        4          2       P1                  3
4        5          2       P4                  3
5        6          2      EDC                  3
6        7          2       P2                  3

相关问题更多 >

编程相关推荐

热门问题

热门文章