Pandas/Python:如何基于其他列的值创建新列,并将额外条件应用于此新列

2024-10-03 09:11:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据框,我想根据以下条件创建一个新列BB

  1. 创建一个新列BB,如果列TGR1中的值为0,则将0指定给BBelse
  2. TGR1中的值不是0,请查找与TGR1中的值对应的列('1'、'2'、'3'),将该列中的值分配给新列BB

我能够使用

df.loc[df['TGR1'] == 0, 'BB'] = 0

我还尝试使用np.where来解决这个问题,但我能找到正确的方法

df['BB'] = np.where(df.TGR1 == 0,0, df.columns == test.TGR1.value )
    
    

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   2
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   2

预期产出:

Dist    Track    EVENT_ID      Date       1      2        3   TGR1 TGR2    BB     
                            
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   1   0     34.00        
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   2   1     5.18     
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   0   2       0
311m    Cran    174331755   2020-10-19  34.00   5.18    19.10   3   1     19.10     

Tags: 数据eventiddfdatedistnptrack
3条回答

一种方法是使用numpy advanced indexing

import numpy as np
# extract columns 1,2,3 into a numpy array with a zeros column stacked on the left
vals = np.column_stack((np.zeros(len(df)), df[list('123')]))

vals
array([[ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ],
       [ 0.  , 34.  ,  5.18, 19.1 ]])

# use TGR1 values as the column index to extract corresponding values
df['BB'] = vals[np.arange(len(df)), df.TGR1.values]

df
   Dist Track   EVENT_ID        Date     1     2     3  TGR1  TGR2     BB
0  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     0  34.00
1  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     1   5.18
2  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     0     2   0.00
3  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     3     1  19.10
4  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     2     2   5.18
5  311m  Cran  174331755  2020-10-19  34.0  5.18  19.1     1     2  34.00

可以使用列表理解和if-else逻辑创建列

# Sample data
df = pd.DataFrame({'TGR1':[random.randint(0,3) for i in range(10)],
                   '1':[random.randint(0,100) for i in range(10)],
                   '2':[random.randint(101,200) for i in range(10)],
                   '3':[random.randint(201,300) for i in range(10)]})
# creating the column
df['BB'] = [0 if tgr1_val == 0 else df.loc[ind,str(tgr1_val)]
            for ind,tgr1_val in enumerate(df['TGR1'].values)]

df

#    TGR1   1    2    3   BB
# 0     0  54  107  217    0
# 1     2  71  128  277  128
# 2     1  25  103  269   25
# 3     0  80  112  279    0
# 4     2  98  167  228  167
# 5     3  26  192  285  285
# 6     0  27  107  228    0
# 7     2  13  103  298  103
# 8     3  28  196  289  289
# 9     2  72  186  251  186

在这里,您可以尝试玩一些numpy技巧,如本answer

我们首先用第1、2和3列中的值定义一个矩阵,然后用零添加第一列

import pandas as pd
import numpy as np

# we first define a matrix 
# with len(df) rows and 4 columns
mat = np.zeros((len(df), 4))

# Then we fill the last 3 columns 
# with values from df
mat[:,1:] = df[["1", "2", "3"]].values

# Then a vector with values from df["TGR1"]
v = df["TGR1"].values


# Finally we take the given index
# from each row on matrix
df["BB"] = np.take_along_axis(mat, v[:,None], axis=1)

时机

我比较了一些答案的时间。我刚拿了一个比原来大一万的

df_bk = pd.concat([df for i in range(10_000)], ignore_index=True)

在运行每个测试之前,我做df = df_bk.copy()

@wwnde的解决方案

CPU times: user 430 ms, sys: 12.1 ms, total: 442 ms
Wall time: 452 ms

@库克斯解

CPU times: user 746 ms, sys: 0 ns, total: 746 ms
Wall time: 746 ms

@拉帕奈溶液

CPU times: user 5.54 ms, sys: 0 ns, total: 5.54 ms
Wall time: 4.84 ms

@Psidom解

CPU times: user 5.93 ms, sys: 141 µs, total: 6.07 ms
Wall time: 5.61 ms

Psidom的解决方案和我的解决方案的时间基本相同。这里有一个情节 enter image description here

相关问题 更多 >