将列中的0替换为列中的groupby中位数

2024-05-19 08:58:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,如下所示。这是2016年12月至2018年11月两种保健产品的销售数据

product     profit      bougt_date      discount    salary
   A         50         2016-12-01      5           25
   A         50         2017-01-03      4           20
   B         200        2016-12-24      10          100
   A         50         2017-01-18      3           0
   B         200        2017-01-28      15          80
   A         50         2017-01-18      6           15
   B         200        2017-01-28      20          0
   A         50         2017-04-18      6           0
   B         200        2017-12-08      25          0
   A         50         2017-11-18      6           20
   B         200        2017-08-21      20          90
   B         200        2017-12-28      30          110
   A         50         2018-03-18      10          0
   B         300        2018-06-08      45          100
   B         300        2018-09-20      50          60
   A         50         2018-11-18      8           45
   B         300        2018-11-28      35          0

从上面我想用列乘积的groupby中位数替换0 salary

说明:

A :  15, 20, 20, 25, 45
 So the median = 20.



B :  60, 80, 90, 100, 100, 110
 So the median = 95.

预期产量

product     profit      bougt_date      discount    salary
   A         50         2016-12-01      5           25
   A         50         2017-01-03      4           20
   B         200        2016-12-24      10          100
   A         50         2017-01-18      3           20
   B         200        2017-01-28      15          80
   A         50         2017-01-18      6           15
   B         200        2017-01-28      20          95
   A         50         2017-04-18      6           20
   B         200        2017-12-08      25          95
   A         50         2017-11-18      6           20
   B         200        2017-08-21      20          90
   B         200        2017-12-28      30          110
   A         50         2018-03-18      10          20
   B         300        2018-06-08      45          100
   B         300        2018-09-20      50          60
   A         50         2018-11-18      8           45
   B         300        2018-11-28      35          95

Tags: the数据datesodiscountproductmedian乘积
2条回答

首先使用.groupby.transform列显示按中值分组的数据。最后,用.loc找到0的薪资,并将其设置为薪资中值

#NOTE - the below line of code uses `median` instead of `np.nanmedian`. These will return different results...
#To anyone reading this, please know which one to use according to your situation...
#As you can see the outputs are different between Chester's answer and mine.
df.loc[df['salary'] == 0, 'salary'] = df.groupby('product')['salary'].transform('median')
df

输出:

    product profit bougt_date discount salary
0   A   50  2016-12-01  5   25.0
1   A   50  2017-01-03  4   20.0
2   B   200 2016-12-24  10  100.0
3   A   50  2017-01-18  3   17.5
4   B   200 2017-01-28  15  80.0
5   A   50  2017-01-18  6   15.0
6   B   200 2017-01-28  20  80.0
7   A   50  2017-04-18  6   17.5
8   B   200 2017-12-08  25  80.0
9   A   50  2017-11-18  6   20.0
10  B   200 2017-08-21  20  90.0
11  B   200 2017-12-28  30  110.0
12  A   50  2018-03-18  10  17.5
13  B   300 2018-06-08  45  100.0
14  B   300 2018-09-20  50  60.0
15  A   50  2018-11-18  8   45.0
16  B   300 2018-11-28  35  80.0

您可以在这里使用^{}^{}来屏蔽0

fill_vals = df.salary.mask(df.salary.eq(0)).groupby(df['product']).transform(np.nanmedian)
df.assign(salary = df.salary.mask(df.salary.eq(0), fill_vals))

   product  profit  bougt_date  discount  salary
0        A      50  2016-12-01         5      25
1        A      50  2017-01-03         4      20
2        B     200  2016-12-24        10     100
3        A      50  2017-01-18         3      20
4        B     200  2017-01-28        15      80
5        A      50  2017-01-18         6      15
6        B     200  2017-01-28        20      95
7        A      50  2017-04-18         6      20
8        B     200  2017-12-08        25      95
9        A      50  2017-11-18         6      20
10       B     200  2017-08-21        20      90
11       B     200  2017-12-28        30     110
12       A      50  2018-03-18        10      20
13       B     300  2018-06-08        45     100
14       B     300  2018-09-20        50      60
15       A      50  2018-11-18         8      45
16       B     300  2018-11-28        35      95

使用^{}

df['salary'] = (np.where(df['salary']==0,df['salary'].replace(0,np.nan).
                groupby(df['product']).transform('median'),df['salary']))

相关问题 更多 >

    热门问题