Groupby应用多个参数

2024-09-28 19:00:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我想根据他们属于哪一组来计算百分位排名。我写了以下代码,并且能够计算,比如zscore,因为只有一个输入。我该怎么处理一个有两个参数的函数呢?谢谢。在

import pandas as pd
import scipy.stats as stats
import numpy as np

funZScore = lambda x: (x - x.mean()) / x.std()
funPercentile = lambda x, y: stats.percentileofscore(x[~np.isnan(x)], y)

A = pd.DataFrame({'Group' : ['A','A','A','A','B','B','B'], 
                  'Value' : [4, 7, None, 6, 2, 8, 1]})

# Compute the Z-score by group
A['Z'] = A.groupby('Group')['Value'].apply(funZScore)

print(A)
Group  Value         Z
0     A    4.0 -1.091089
1     A    7.0  0.872872
2     A    NaN       NaN
3     A    6.0  0.218218
4     B    2.0 -0.440225
5     B    8.0  1.144586
6     B    1.0 -0.704361

# compute the percentile rank by group
# how to put two arguments into groupby apply? 
# I hope to get something like below
Group  Value         Z    P
0     A    4.0 -1.091089    33.33
1     A    7.0  0.872872   100 
2     A    NaN       NaN   NaN
3     A    6.0  0.218218   66.67
4     B    2.0 -0.440225   66.67
5     B    8.0  1.144586   100
6     B    1.0 -0.704361   33.33

Tags: thelambdaimportbyvalueasstatsnp
1条回答
网友
1楼 · 发布于 2024-09-28 19:00:41

我认为需要:

d = A.groupby('Group')['Value'].apply(list).to_dict()
print (d)
{'A': [4.0, 7.0, nan, 6.0], 'B': [2.0, 8.0, 1.0]}


A['P'] = A.apply(lambda x: funPercentile(np.array(d[x['Group']]), x['Value']), axis=1)
print (A)
  Group  Value         Z           P
0     A    4.0 -1.091089   33.333333
1     A    7.0  0.872872  100.000000
2     C    NaN       NaN         NaN
3     A    6.0  0.218218   66.666667
4     B    2.0 -0.440225   66.666667
5     B    8.0  1.144586  100.000000
6     B    1.0 -0.704361   33.333333

相关问题 更多 >