在Python中生成多列值的条件语句

2024-10-01 04:58:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图根据下面数据框中一列“Number”中的条件替换列“Alloc1”和“Alloc2”中的值

data = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [np.NaN, np.NaN, np.NaN, np.NaN], 'Alloc2': [np.NaN, np.NaN, np.NaN, np.NaN]}
# Create DataFrame.
df = pd.DataFrame(data)

我根据条件插入值的代码如下:-

for  numbers  in df["Number"]:
    
    if  (numbers == 99):
        df["Alloc1"] = 31
        df["Alloc2"] = 3

    else:
        df["Alloc1"] = 0
        df["Alloc2"] = numbers/2 

上面的语句似乎只执行语句的else部分,并执行“Number”列中不是99的最后一个值。我怎样才能解决这个问题?功能会很好。理想输出应为:-

final = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [31, 31, 0, 0], 'Alloc2': [3, 3, 10, 20]}
# Create DataFrame.
final_df = pd.DataFrame(final)


Tags: idnumberdataframedfdatacreatenpnan
3条回答

尝试使用矢量化操作来处理此问题

import pandas as pd

data = {'ID': ['001', '002', '003', '004'], 'Number': [99, 99, 20, 40], 'Alloc1': [np.NaN, np.NaN, np.NaN, np.NaN], 'Alloc2': [np.NaN, np.NaN, np.NaN, np.NaN]}
# Create DataFrame.
df = pd.DataFrame(data)

df['Alloc1'] = 0
df['Alloc2'] = df['Number']/2
df.loc[df['Number'] == 99,'Alloc1'] = 31
df.loc[df['Number'] == 99,'Alloc2'] = 3
df
output
    ID  Number  Alloc1  Alloc2
0  001      99      31     3.0
1  002      99      31     3.0
2  003      20       0    10.0
3  004      40       0    20.0

假设您可以安全地覆盖整个列Alloc1Alloc2,您可以按照Henry Ecker的建议使用np.where

df['Alloc1'] = np.where(df['Number'] == 99, 31, 0)
df['Alloc2'] = np.where(df['Number'] == 99, 3, df['Number'] / 2).astype(int)

print(df)
    ID  Number  Alloc1  Alloc2
0  001      99      31       3
1  002      99      31       3
2  003      20       0      10
3  004      40       0      20

我认为“矢量化”解决方案的性能会比这个更好,而且无论是那个版本还是where版本都更“优秀”。这个答案只是向你展示如何使用更像你所遵循的方法来实现你想要的。这不是一个非常“熊猫”式的做事方式,但可能有助于理解为什么你尝试的东西不起作用

import pandas as pd

data = {'ID': ['001', '002', '003', '004'],
        'Number': [99, 99, 20, 40]}
        # Don't actually need the NaN-filled 'Alloc1' and 'Alloc2' yet
        # Those columns get created when you give them values, later
df = pd.DataFrame(data)

def allocateCodes(row):
    if (row['Number'] == 99):
        row['Alloc1'] = 31
        row['Alloc2'] = 3
    else:
        row['Alloc1'] = 0
        row['Alloc2'] = row['Number'] / 2
    return row

# axis="columns" means go 'take each row' (i.e., a whole set of columns)
# at a time (can also use axis=1)
# instead of 'take each column' (axis="rows" / axis=0)      
outputDf = df.apply(allocateCodes, axis="columns")

print(outputDf)

产出:

    ID  Number  Alloc1  Alloc2
0  001      99      31     3.0
1  002      99      31     3.0
2  003      20       0    10.0
3  004      40       0    20.0

相关问题 更多 >