为什么我在使用Pandas应用程序时会得到属性错误?

2024-05-19 13:32:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我应该如何根据条件将NaN值转换为分类值。尝试转换Nan值时出错。

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

我的数据框是这样的。我的功能是把性别中的NaN值转换成分类值

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

如果我运行代码,就会出错

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

完整数据集-https://github.com/lakshmipriya04/py-sample


Tags: 数据iftitle分类nangendercolsbeauty
3条回答

如果我们应该使用NaN值,fillna可以是以下方法之一:-)

df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]: 
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

或者简单地使用loc作为@COLDSPEED答案的选项3

cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'


    category        gender  sub-category    title
0   health&beauty   women   makeup          lipbalm
1   health&beauty   women   makeup          lipstick
2   NaN             women       NaN         lipgloss

这里需要注意的是-

  1. 如果只使用两列,那么对4列调用apply是浪费
  2. 一般来说,调用apply是浪费的,因为它速度很慢,并且不会给您提供矢量化好处
  3. 在apply中,您处理的是标量,因此您不会像使用.str对象那样使用pd.Series访问器。title.contains就足够了。或者更严重的是,"lip" in title
  4. gender.isnull完全错误,gender是标量,它没有isnull属性

选项1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

不仅速度快,而且简单。如果您担心区分大小写,可以让您的contains检查不区分大小写-

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)

选项2
另一种方法是使用pd.Series.mask/pd.Series.where-

df['gender'] = df.gender.mask(m, 'women')

或者

df['gender'] = df.gender.where(~m, 'women')

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

mask根据提供的掩码隐式地将新值应用于列。

相关问题 更多 >