错误:序列的真值不明确。使用a.empty、a.bool()、a.item()、a.any()或a.all()。无法更新数据帧列

2024-09-24 20:34:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧df,它有以下值:

           activityType activity_preferance
userID      
agashi1996  joinClub    Nan
agashi1998  post        Nan
agashi1998  post        Nan
agashi1998  post        Nan
agashi1994  followuser  Nan

userID列是此处的索引

如果activityType分别是joinClubpostfollowuser,我想用1,2和3填充activity_preferance

我写了这段代码:

for i,row in df_activity_filter.iterrows():
    if (df_activity_filter.loc[i,'activityType'] == 'joinClub'):
        df_activity_filter.loc[i,'activity_preferance'] = 1
    elif (df_activity_filter.loc[i,'activityType'] == 'post'):
        df_activity_filter.loc[i,'activity_preferance'] = 2
    elif (df_activity_filter.loc[i,'activityType'] == 'followuser'):
        df_activity_filter.loc[i,'activity_preferance'] = 3

我得到一个错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所需的数据帧/输出应如下所示:

           activityType activity_preferance
userID      
agashi1996  joinClub    1
agashi1998  post        2
agashi1998  post        2
agashi1998  post        2
agashi1994  followuser  3

有什么帮助吗


Tags: 数据dfactivitynanfilterpostlocuserid
3条回答

循环非常慢。您应该在此处使用^{}

In [1577]: import numpy as np

In [1578]: conditions = [df.activityType == 'joinClub', df.activityType == 'post', df.activityType == 'followuser']

In [1579]: choices = [1, 2, 3]

In [1580]: df['activity_preferance'] = np.select(conditions, choices)

In [1581]: df
Out[1581]: 
           activityType  activity_preferance
userID                                      
agashi1996     joinClub                    1
agashi1998         post                    2
agashi1998         post                    2
agashi1998         post                    2
agashi1994   followuser                    3

与其他解决方案的性能比较:

我的解决方案:

In [1582]: %timeit np.select(conditions, choices)
45.5 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@Djib2011的解决方案:

In [1584]: %timeit df['activityType'].map(mapping)
401 µs ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@JenilDave的解决方案:

In [1590]: %timeit df.activityType.replace({'joinClub':1,'post':2,'followuser':3})
490 µs ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@yashjain的解决方案:

In [1585]: %timeit df['activityType'].apply(lambda x: 1 if x=='joinClub' else None)
114 µs ± 1.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

th;dr我认为您实际上想要的是将"activityType"的值映射到整数。这最好由'map' method of pd.Series完成

mapping = {'joinClub': 1, 'post': 2, 'followuser': 3}
df['activity_preferance'] = df['activityType'].map(mapping)

错误的意思是,如果您有这样一个系列(ser):

>>> print(ser)

0  True
1  False
2  True
3  True
4  False
...

它的真值可能不明确(即它是真的还是假的?)。如果我要写,python应该做什么:

if ser:
   # do something

没有明确的答案,因为ser的布尔值是不明确的,因此会引发错误

Mayank提供了一个很好的答案,您仍然可以探索pandasapply方法

df_activity_filter['activity_preferance'] = df['activityType'].apply(lambda x: 1 if x=='joinClub' else None)
df_activity_filter['activity_preferance'] = df['activityType'].apply(lambda x: 2 if x=='post' else None)

and similarly others...

相关问题 更多 >