基于不同的列创建新列

Col1 Col2 Col3 Col4 Col5 Col6 Col7 1 A T 1 AG NBL NH 2 A T 1 NAG BL NH 3 A M 2 NAG NBL HL 4 NS M 1 NAG BL NH 5 NS T 1 NAG NBL HL 6 A M 2 NAG NBL HL

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)* 'A1' IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)* 'A2' IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)* 'B1' IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)* 'B2' IF Col7 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)* 'H1' IF Col6 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)* 'H2'

2条回答

网友

1楼 · 编辑于 2024-10-02 10:27:53

我发现numpy.select()非常适合这个问题，你有很多条件，需要把它映射成一个值

import numpy as np
import pandas as pd

def my_transform(row: pd.Series):
    choices = [all([row['Col5'].strip() == 'AG', row['Col4'] == 1]), 
               all([row['Col5'].strip() == 'AG', row['Col4'] == 0]), True]
    results = ['A1', 'A2', 'Default']
    return np.select(choices, results)

df['my_new_col'] = df.apply(my_transform, axis=1)

select所做的是检查第一个truthy值的索引，并从传递给它的第二个参数返回与该索引对应的值（此处^{）。在这里，我简化了您的前两条规则，但您可以根据具体情况对其进行扩展

网友

2楼 · 编辑于 2024-10-02 10:27:53

由于这里的逻辑相当复杂，我建议将条件放在函数中，并使用DataFrame.apply()为数据集中的每一行调用该函数。以下是我如何将您的示例翻译成熊猫：

import pandas as pd

df = pd.read_csv("test.csv")


def classify(row):
    Col2 = row["Col2"]
    Col3 = row["Col3"]
    Col4 = row["Col4"]
    Col5 = row["Col5"]
    Col6 = row["Col6"]
    Col7 = row["Col7"]
    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'A1'

    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'A2'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'B1'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'B2'

    if Col7 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'H1'

    if Col6 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'H2'

    # No match
    return None


df["NewCol"] = df.apply(classify, axis=1)
print(df)

注意：我在您的数据集上尝试了此函数，得到了以下结果，这可能不是您想要的结果：

   Col1 Col2 Col3  Col4 Col5 Col6 Col7 NewCol
0     1    A    T     1   AG  NBL   NH   None
1     2    A    T     1  NAG   BL   NH   None
2     3    A    M     2  NAG  NBL   HL   None
3     4   NS    M     1  NAG   BL   NH   None
4     5   NS    T     1  NAG  NBL   HL   None
5     6    A    M     2  NAG  NBL   HL   None

检查每一行，它们似乎都是正确的——没有一行遵循您的任何规则。我建议仔细检查一下你的规则是否正确

相关问题更多 >

编程相关推荐

热门问题

热门文章