无法在pandas中获得结果,如果两列的三个值中有相同的值,则保留第一列的值,否则根据条件保留其他值

2024-05-17 08:46:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,其值如下所示:

AF_SC       TB_SC       VS_SC   
negative    negative    negative
positive    positive    positive
neutral     negative    negative
negative    negative    positive
positive    positive    neutral
negative    negative    positive
neutral     positive    neutral
negative    positive    positive
negative    positive    neutral

我要做的是得到一个结果列,它的值基于以下条件:

1. if values in col AF_SC and TB_SC are same, then 'result' col will have values of AF_SC (or TB_SC, as both are same)

2. if values in col TB_SC and VS_SC are same, then 'result' col will have values of TB_SC (or VS_SC, as both are same)

3. if values in col AF_SC and VS_SC are same, then 'result' col will have values of AF_SC (or VS_SC, as both are same)

4. otherwise 'result' col will have values as 'neutral'

换句话说,如果三列中有两列具有相同的值,比如说“负”,那么“结果”列将具有“负”,同样,如果三列中有两列具有相同的值,比如说“正”,那么“结果”列将具有“正”值,如果一列具有“正”,另一个为“负”,第三个为“中性”(即3列中的所有三个不同值),则“结果”列将以“中性”作为值

结果DF应如下所示:

AF_SC       TB_SC       VS_SC       Result
negative    negative    negative    negative
positive    positive    positive    positive
neutral     negative    negative    negative
negative    negative    positive    negative
positive    positive    neutral     positive
negative    negative    positive    negative
neutral     positive    neutral     neutral
negative    positive    positive    positive
negative    positive    neutral     neutral

我试图用np.where方法来实现这一点:

df['result'] = np.where((df['AF_SC'] == df['TB_SC']) or (df['AF_SC'] == df['VS_SC']), df['AF_SC'], 
                         np.where((df['TB_SC'] == df['VS_SC']), df['TB_SC'], "neutral"))

不幸的是,它给了我一个错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

不知道我犯了什么样的错误

除了我想要达到的结果,还有别的选择吗


Tags: ordfcolresultwilltbarevs
2条回答

使用熊猫的本地where():

df['result'] = 'neutral'
df['result'] = df['result'].where(
               df['AF_SC'] != df['VS_SC'], df['VS_SC']).where(
               df['TB_SC'] != df['VS_SC'], df['VS_SC']).where(
               df['TB_SC'] != df['AF_SC'], df['AF_SC'])

这里有可能使用^{},对于按位的链OR使用|

m1 = df['AF_SC'] == df['TB_SC']
m2 = df['AF_SC'] == df['VS_SC']
m3 = df['TB_SC'] == df['VS_SC']
df['result'] = np.select([m1 | m2, m3], [df['AF_SC'], df['TB_SC']], "neutral")

您的解决方案应该更改:

df['result'] = np.where((df['AF_SC'] == df['TB_SC']) | 
                         (df['AF_SC'] == df['VS_SC']), df['AF_SC'], 
               np.where((df['TB_SC'] == df['VS_SC']), df['TB_SC'], "neutral"))
print (df)
      AF_SC     TB_SC     VS_SC    result
0  negative  negative  negative  negative
1  positive  positive  positive  positive
2   neutral  negative  negative  negative
3  negative  negative  positive  negative
4  positive  positive   neutral  positive
5  negative  negative  positive  negative
6   neutral  positive   neutral   neutral
7  negative  positive  positive  positive
8  negative  positive   neutral   neutral

相关问题 更多 >