为什么“df.isin”不能处理我的数据?

2024-10-04 15:26:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在制作一个data frame并试图找到其中的“?”的数目。CSV的某些部分:

age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
25, Private,226802, 11th,7, Never-married, Machine-op-inspct, Own-child, Black, Male,0,0,40, United-States, <=50K
38, Private,89814, HS-grad,9, Married-civ-spouse, Farming-fishing, Husband, White, Male,0,0,50, United-States, <=50K
28, Local-gov,336951, Assoc-acdm,12, Married-civ-spouse, Protective-serv, Husband, White, Male,0,0,40, United-States, >50K
44, Private,160323, Some-college,10, Married-civ-spouse, Machine-op-inspct, Husband, Black, Male,7688,0,40, United-States, >50K
18, ?,103497, Some-college,10, Never-married, ?, Own-child, White, Female,0,0,30, United-States, <=50K
34, Private,198693, 10th,6, Never-married, Other-service, Not-in-family, White, Male,0,0,30, United-States, <=50K
29, ?,227026, HS-grad,9, Never-married, ?, Unmarried, Black, Male,0,0,40, United-States, <=50K

我正在使用

df.isin(['?']).sum(axis=0)

但是,对于所有列,它都返回0,尽管数据中有“?”

我怎么修理它

谢谢


Tags: privatemachinemaleunitedblackwhitestateseducation
2条回答

在值之前有一个额外的空格:

df.isin([' ?']).sum(axis=0)

一件事是你可以去掉这些值;-)

df['workclass'].str.strip().isin(['?'])

问题是在这个表中?会显示额外的空间。试试-

df.isin([' ?']).sum(axis=0)

一般来说,我建议事先格式化相关列

检查-df.iloc[6]['occupation']时可以看到额外的空间

相关问题 更多 >

    热门问题