基于不同的列创建新列

2024-10-02 10:27:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据框:

Col1 Col2 Col3 Col4 Col5  Col6  Col7  
1     A    T     1    AG   NBL   NH
2     A    T     1    NAG  BL    NH
3     A    M     2    NAG  NBL   HL 
4     NS   M     1    NAG  BL    NH
5     NS   T     1    NAG  NBL   HL 
6     A    M     2    NAG  NBL   HL

我想根据以下条件创建一个新列:

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'A1'

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'A2'

IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'B1'

IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'B2'

IF Col7 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'H1'

IF Col6 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'H2'

我在最初的df中有许多不同的条件,但我正在尝试创建一种方法来添加这些条件。 我在jupyter和phyton一起工作


Tags: andtheinifhlcol2col3then
2条回答

我发现numpy.select()非常适合这个问题,你有很多条件,需要把它映射成一个值

import numpy as np
import pandas as pd

def my_transform(row: pd.Series):
    choices = [all([row['Col5'].strip() == 'AG', row['Col4'] == 1]), 
               all([row['Col5'].strip() == 'AG', row['Col4'] == 0]), True]
    results = ['A1', 'A2', 'Default']
    return np.select(choices, results)

df['my_new_col'] = df.apply(my_transform, axis=1)

select所做的是检查第一个truthy值的索引,并从传递给它的第二个参数返回与该索引对应的值(此处^{)。在这里,我简化了您的前两条规则,但您可以根据具体情况对其进行扩展

由于这里的逻辑相当复杂,我建议将条件放在函数中,并使用DataFrame.apply()为数据集中的每一行调用该函数。以下是我如何将您的示例翻译成熊猫:

import pandas as pd

df = pd.read_csv("test.csv")


def classify(row):
    Col2 = row["Col2"]
    Col3 = row["Col3"]
    Col4 = row["Col4"]
    Col5 = row["Col5"]
    Col6 = row["Col6"]
    Col7 = row["Col7"]
    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'A1'

    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'A2'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'B1'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'B2'

    if Col7 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'H1'

    if Col6 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'H2'

    # No match
    return None


df["NewCol"] = df.apply(classify, axis=1)
print(df)

注意:我在您的数据集上尝试了此函数,得到了以下结果,这可能不是您想要的结果:

   Col1 Col2 Col3  Col4 Col5 Col6 Col7 NewCol
0     1    A    T     1   AG  NBL   NH   None
1     2    A    T     1  NAG   BL   NH   None
2     3    A    M     2  NAG  NBL   HL   None
3     4   NS    M     1  NAG   BL   NH   None
4     5   NS    T     1  NAG  NBL   HL   None
5     6    A    M     2  NAG  NBL   HL   None

检查每一行,它们似乎都是正确的——没有一行遵循您的任何规则。我建议仔细检查一下你的规则是否正确

相关问题 更多 >

    热门问题