在pandas中的多级数据帧上创建掩码

2024-06-17 18:35:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据框

          Style     Land Code      Assessment_per_living
  0       Cape       1010              94.933749
  1       Cape       1010              95.235000
  2       Cape       1010             109.222000
  3   Colonial       1010              64.175109
  4   Colonial       1010              77.270968
  5       Cape       1011              80.856481
  6       Cape       1011              58.060029
  7   Colonial       1011              45.572414
  8   Colonial       1011              67.856000
  9   Colonial       1011              79.564000
  10     Ranch       1011              52.769231
  11     Ranch       1011              57.810434
  12     Ranch       1011              52.472952
  13  Colonial       1012             107.820248
  14  Colonial       1012              84.827050
  15  Colonial       1012              63.626108
  16  Colonial       1012              51.931994
  17     Ranch       1012             109.641711
  18     Ranch       1012              88.243000
  19     Ranch       1012              74.783000

我可以做下面的groupby

n\u hood\u cost\u count=数据框groupby([“土地代码”,“风格])[“生活评估”].mean()

产生了以下数据帧

Land Code  Style   
 1010       Cape        99.796916
          Colonial      70.723038
  1011      Cape        69.458255
          Colonial      64.330805
           Ranch        54.350872
  1012    Colonial      77.051350
            Ranch       90.889237

我正在尝试添加一个新列,该列将坚持多级groupby,但在Assessment\u per\u living列中返回大于平均值的单个项目。例如,数据帧中的行2将是1(传递掩码),而行0和1则不是。我想返回大于平均值的所有groupby集(1010/Cape、1010/Colonial、1011/Cape、1011/Colonial、1010/Ranch等)的结果,并将其保存在新的数据帧中。有人能教我怎么做吗?你知道吗


Tags: 数据stylecountcode平均值groupbycostland
2条回答

可以使用merge,然后使用条件。你知道吗

n_hood_cost_count= df.groupby(['Land Code','Style'])['Assessment_per_living'].mean().reset_index()
df_new = df.merge(n_hood_cost_count,on=['Land Code','Style'],how='outer')
df_new = df[df_new['Assessment_per_living_x']>df_new['Assessment_per_living_y']]

输出

      Style  Land Code  Assessment_per_living
2       Cape       1010             109.222000
4   Colonial       1010              77.270968
5       Cape       1011              80.856481
8   Colonial       1011              67.856000
9   Colonial       1011              79.564000
11     Ranch       1011              57.810434
13  Colonial       1012             107.820248
14  Colonial       1012              84.827050
17     Ranch       1012             109.641711

可以将^{}用于长度与df相同的means,与^{}和上次使用^{}进行比较:

means = df.groupby(['Land Code','Style'])['Assessment_per_living'].transform('mean')
print (means)
0     99.796916
1     99.796916
2     99.796916
3     70.723039
4     70.723039
5     69.458255
6     69.458255
7     64.330805
8     64.330805
9     64.330805
10    54.350872
11    54.350872
12    54.350872
13    77.051350
14    77.051350
15    77.051350
16    77.051350
17    90.889237
18    90.889237
19    90.889237
Name: Assessment_per_living, dtype: float64

mask = df['Assessment_per_living'].gt(means)
df = df[mask]
print (df)
       Style  Land Code  Assessment_per_living
2       Cape       1010             109.222000
4   Colonial       1010              77.270968
5       Cape       1011              80.856481
8   Colonial       1011              67.856000
9   Colonial       1011              79.564000
11     Ranch       1011              57.810434
13  Colonial       1012             107.820248
14  Colonial       1012              84.827050
17     Ranch       1012             109.641711

如果只需要前2列,则为唯一对添加^{}^{}

mask = df['Assessment_per_living'].gt(means)
df = df.loc[mask, ['Style','Land Code']].drop_duplicates()
print (df)
       Style  Land Code
2       Cape       1010
4   Colonial       1010
5       Cape       1011
8   Colonial       1011
11     Ranch       1011
13  Colonial       1012
17     Ranch       1012 

相关问题 更多 >