Groupby应用多个参数

import pandas as pd import scipy.stats as stats import numpy as np funZScore = lambda x: (x - x.mean()) / x.std() funPercentile = lambda x, y: stats.percentileofscore(x[~np.isnan(x)], y) A = pd.DataFrame({'Group' : ['A','A','A','A','B','B','B'], 'Value' : [4, 7, None, 6, 2, 8, 1]}) # Compute the Z-score by group A['Z'] = A.groupby('Group')['Value'].apply(funZScore) print(A) Group Value Z 0 A 4.0 -1.091089 1 A 7.0 0.872872 2 A NaN NaN 3 A 6.0 0.218218 4 B 2.0 -0.440225 5 B 8.0 1.144586 6 B 1.0 -0.704361 # compute the percentile rank by group # how to put two arguments into groupby apply? # I hope to get something like below Group Value Z P 0 A 4.0 -1.091089 33.33 1 A 7.0 0.872872 100 2 A NaN NaN NaN 3 A 6.0 0.218218 66.67 4 B 2.0 -0.440225 66.67 5 B 8.0 1.144586 100 6 B 1.0 -0.704361 33.33

1条回答

网友

1楼 · 发布于 2024-09-28 19:00:41

我认为需要：

d = A.groupby('Group')['Value'].apply(list).to_dict()
print (d)
{'A': [4.0, 7.0, nan, 6.0], 'B': [2.0, 8.0, 1.0]}


A['P'] = A.apply(lambda x: funPercentile(np.array(d[x['Group']]), x['Value']), axis=1)
print (A)
  Group  Value         Z           P
0     A    4.0 -1.091089   33.333333
1     A    7.0  0.872872  100.000000
2     C    NaN       NaN         NaN
3     A    6.0  0.218218   66.666667
4     B    2.0 -0.440225   66.666667
5     B    8.0  1.144586  100.000000
6     B    1.0 -0.704361   33.333333

相关问题更多 >

编程相关推荐

热门问题

热门文章