如何应用if条件并应用于datafram

2条回答

网友

1楼 · 编辑于 2024-10-02 18:18:07

您应该避免使用^{}，而是使用^{}和x代替{}来处理每行，对于一个元素列表，将[]添加到{}、x['bool2']和{}：

import ast

def new_func(x):
    d1 = ast.literal_eval(x['bool1_res'])
    d2 = ast.literal_eval(x['bool2_res'])
    d3 = ast.literal_eval(x['bool3_res'])

    if d1['is_doc1'] == d3['detected'] == True:
        resp = {
            "task_id": "uid",
            "group_id": "uid",
            "data": {
            "document1": [x['bool1']],
            "document2": [x['bool3']]
            }
            }
    elif d2['is_doc1'] == d3['detected'] == True:
        resp = {
            "task_id": "user_uid",
            "group_id": "uid",
            "data": {
            "document1": [x['bool2']],
            "document2": [x['bool3']]
            }
            }
    elif d3['detected'] == False:
        resp = 'Not valid'
    else:
        resp = 'Not valid'
    return resp
df['new'] = df.apply(new_func, axis = 1)

^{pr2}$

网友

2楼 · 编辑于 2024-10-02 18:18:07

我假设这就是扩展代码行后数据的样子：（而且，如果您可以添加一些空格…^^ ^）

df = pd.DataFrame(
    [
        [1001, "27452.webp", "981.webp", "d92e.webp",
            "{'is_doc1': False, 'is_doc2': True}",
            "{'is_doc1': True, 'is_doc2': True}",
            "{'detected': True, 'count': 1}"
        ],
        [1002, "27452.webp", "981.webp", "d92e.webp",
            "{'is_doc1': True, 'is_doc2': True}",
            "{'is_doc1': False, 'is_doc2': True}",
            "{'detected': True, 'count': 1}"
        ],
        [1003, "27452.webp", "981.webp", "d92e.webp",
            "{'is_doc1': True, 'is_doc2': True}",
            "{'is_doc1': False, 'is_doc2': True}",
            "{'detected': False, 'count': 1}"
        ],
    ],
    columns=['user_uid', 'bool1', 'bool2', 'bool3', 'bool1_res', 'bool2_res',
             'bool3_res'
    ]
)

我的回答

执行分为两部分：（1）解析字符串；（2）处理/生成“新”列值。在

^{pr2}$

第1部分：解析dict字符串

此函数通过pd.DataFrame.applymap应用于dataframe中的每个元素，并使用ast.literal_eval，正如@jezrael正确地建议的那样。在

def str2dict(x: Any):
    """(Step 1) Parses argument using ast.literal_eval"""
    try:
        x = ast.literal_eval(x.strip())

    # if x is not parsable, return x as-is
    except ValueError as e:
        pass

    finally:
        return x

第2部分：处理数据（即制作“新”列）

此函数应用于数据帧的每一行（由pd.DataFrame.agg）：

根据你发布函数中的逻辑，我：

检查bool3['detected']是否为False（前两个条件都检测到==True）；如果是，则引发ValueError
检查bool1的is_doc1是否为True，如果不是，则为bool2
如果两种情况都不是真的，则引发ValueError

def make_newcol_entry(x: pd.Series):
    """(Step 2) constructs "new" column value for pandas group"""
    try:
        if x.bool3_res['detected'] is False:
            raise ValueError
        # check is_doc1 properties
        elif x.bool1_res['is_doc1'] is True:
            document1 = x.bool1
        elif x.bool2_res['is_doc1'] is True:
            document1 = x.bool2
        else:
            raise ValueError
    except ValueError:
        entry = "not valid"
        pass
    # if there is `is_doc1` that is True, construct your entry.
    else:
        entry = {
            "task_id": "uid",
            "group_id": "uid",
            "data": {"document1": document1, "document2": x.bool3}
        }

    return entry

要执行，请运行：

df = df.assign(new=lambda x: x.applymap(str2dict) \
                              .agg(make_newcol_entry, axis=1))

请注意，这将解析dataframe中的所有元素。在

要只解析列bool_res列，可以分两步执行：

# select and parse only res cols ('bool#_res'), then apply
df.update(df.filter(regex=r'_res$', axis=1).applymap(str2dict))
df = df.assign(lambda x: x.agg(apply_make_newcol_entry, axis=1))

结果

$ df
    user_uid    bool1   bool2   bool3   bool1_res   bool2_res   bool3_res   new
0   1001    27452.webp  981.webp    d92e.webp   {'is_doc1': False, 'is_doc2': True} {'is_doc1': True, 'is_doc2': True}  {'detected': True, 'count': 1}  {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '981.webp', 'document2': 'd92e.webp'}}
1   1002    27452.webp  981.webp    d92e.webp   {'is_doc1': True, 'is_doc2': True}  {'is_doc1': False, 'is_doc2': True} {'detected': True, 'count': 1}  {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '27452.webp', 'document2': 'd92e.webp'}}
2   1003    27452.webp  981.webp    d92e.webp   {'is_doc1': True, 'is_doc2': True}  {'is_doc1': False, 'is_doc2': True} {'detected': False, 'count': 1} not valid

$ df['new']
0   {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '981.webp', 'document2': 'd92e.webp'}}
1   {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '27452.webp', 'document2': 'd92e.webp'}}
2   not valid
Name: new, dtype: object

我的回答

第1部分：解析dict字符串

第2部分：处理数据（即制作“新”列）

要执行，请运行：

结果

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何应用if条件并应用于datafram

我的回答

第1部分：解析dict字符串

第2部分：处理数据（即制作“新”列）

要执行，请运行：

结果

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >