如何根据某些条件部分填充缺失值?

2024-09-24 22:26:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据集缺少以下值:

 print(train.shape)
 (54808, 6)

employee_id                0
name                       0
education               2409
age                        0
Salary_hike             4124
length_of_service          0

我想根据服务的长度(如果小于1)将缺少的薪资(如果小于1)行值填充为0

例如:

train = pd.DataFrame({'employee_id':[103,101,103,104,105,106,107,108,109,110],
                      'Name':['A','B','C','D','E','F','G','H','I','J'],
                      'Age' :[20,30,21,24,25,22,27,23,24,21],
                     'length_of_service':[1,2,1,4,5,1,7,1,2,1], 
                      'Salary_hike':[np.nan,5, np.nan, 6, 7,1,9,1,4,np.nan]                ,
                                                                            })

因为我已经确认 有多少行的服务长度小于一

(train['length_of_service']<= 1).sum()
5

接下来,我用以下两种条件填充数据框

train[(train.length_of_service <=1) & (train['Salary_hike'].isnull())]

        employee_id     Name    Age     length_of_service   Salary_hike
0   103     A   20  1   NaN
2   103     C   21  1   NaN
9   110     J   21  1   NaN

现在,如何将上述筛选列表中缺少的加薪值填充为0

    employee_id     Name    Age     length_of_service   Salary_hike
0   103     A   20  1   0
2   103     C   21  1   0
9   110     J   21  1   0

我使用了评论部分提到的命令,如:

train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0

但我还是得到了缺失的值,如3

train.isnull().sum()

大家好,

感谢您的宝贵意见:

现在,它在使用以下命令后工作:

train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),['Salary_hike']]=0

Tags: of数据nameidageservicenpemployee
1条回答
网友
1楼 · 发布于 2024-09-24 22:26:09

我相信你需要:

train = pd.DataFrame({'length_of_service':[-1,5,4,-8,9,-3,0], 
                      'Salary_hike':[10,np.nan, 5, np.nan, np.nan, 8, np.nan]})
train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0

print (train)
   length_of_service  Salary_hike
0                 -1         10.0
1                  5          NaN
2                  4          5.0
3                 -8          0.0
4                  9          NaN
5                 -3          8.0
6                  0          0.0

如果值为-1,则需要设置:

train = pd.DataFrame({'length_of_service':[-1,5,4,-1,9,-3,-1], 
                      'Salary_hike':[10,np.nan, 5, np.nan, np.nan, 8, np.nan]})
train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0

print (train)
   length_of_service  Salary_hike
0                 -1         10.0
1                  5          NaN
2                  4          5.0
3                 -1          0.0
4                  9          NaN
5                 -3          8.0
6                 -1          0.0

相关问题 更多 >