Python规范化与dataframe字段的连接

2024-09-30 16:21:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧:

df = pd.DataFrame([
    [1, '{"issues": [{"issue_name": "fixed.issue.cpeUnreachable", "issue_id": "52446*", "actions": [], "issueFixed": "true"}, {"issue_name": "fixed.issue.internet.cgnat.statusActive", "issueFixed": "false", "issue_id": "8834*4", "actions": [            {"action_name": "cableCheck", "success": "false"},             {"action_name": "otherCheck", "success": "true"}]}, {"issue_name": "fixed.issue.rf.ds.quality", "issue_id": "3642*", "actions": [            {"action_name": "akcija 1", "success": "false"},             {"action_name": "akcija 2", "success": "false"},             {"action_name": "akcija 3", "success": "false"},             {"action_name": "akcija 4", "success": "false"},             {"action_name": "akcija 5", "success": "false"}], "issueFixed": "true"}, {"issue_name": "fixed.issue.rf.us.quality", "issueFixed": "false", "issue_id": "8834*3", "actions": []},    {"issue_name" : "rebootBeforeTicket",        "actions" : [{"action_name": "rebootCpeDevice", "success" : "false"},            {"action_name": "rebootStbDevice", "success" : "true"}]} ]}'],
    [2, '{"issues": [{"issue_name": "fixed.issue.cpeUnreachable", "issue_id": "52446*", "actions": [], "issueFixed": "true"}, {"issue_name": "fixed.issue.internet.cgnat.statusActive", "issueFixed": "false", "issue_id": "8834*4", "actions": [            {"action_name": "cableCheck", "success": "false"},             {"action_name": "otherCheck", "success": "true"}]}, {"issue_name": "fixed.issue.rf.ds.quality", "issue_id": "3642*", "actions": [            {"action_name": "akcija 1", "success": "false"},             {"action_name": "akcija 2", "success": "false"},             {"action_name": "akcija 3", "success": "false"},             {"action_name": "akcija 4", "success": "false"},             {"action_name": "akcija 5", "success": "false"}], "issueFixed": "true"}, {"issue_name": "fixed.issue.rf.us.quality", "issueFixed": "false", "issue_id": "8834*3", "actions": []},    {"issue_name" : "rebootBeforeTicket",        "actions" : [{"action_name": "rebootCpeDevice", "success" : "false"},            {"action_name": "rebootStbDevice", "success" : "true"}]} ]}']], 
    columns=['session_id', 'json_text'])
df

enter image description here

我想将此数据帧转换为:

enter image description here

到目前为止,我尝试了以下方法:

df1 = pd.DataFrame()

for idx, row in df.iterrows():
    json_contents = json.loads(row.stat_dimen_value)
    df_json = json_normalize(json_contents['issues'], record_path=['actions'], meta=['issue_id', 'issue_name', 'issueFixed'], errors='ignore')
    df_json.insert(0, 'session_id', row.session_id)
    df1 = pd.concat([df1, df_json])


df1 = df1[['session_id', 'issue_id', 'issue_name', 'issueFixed', 'action_name', 'success']]

它工作,但我不满意的for循环。 我不得不加入新包装的dfïujson(来自df.json\u文本字段)数据帧数据框会话\u id现场。因为我找不到其他方法,所以我使用for循环。你知道吗

有没有更好的方法将dfèjson与其数据框会话\u id(也可能是som其他df字段)不使用for循环的字段?你知道吗

敬礼。你知道吗

编辑1,使用json注入的解决方案:

json_ser = df.apply(lambda row: json.loads(row.stat_dimen_value[:1] + f'"session_id":{row.session_id}, ' + row.stat_dimen_value[1:]), axis=1)
json_ser.head()


df1 = json_normalize(json_ser, \
    record_path=['issues', 'actions'], \
    meta=['session_id', ['issues', 'issue_id'], ['issues', 'issue_name'], ['issues', 'issueFixed']], \
    sep='_', \
    errors='ignore') \
    .rename(columns={'issues_issue_id' : 'issue_id', 'issues_issue_name' : 'issue_name', 'issues_issueFixed' : 'issueFixed'}) \
    [['session_id', 'issue_id', 'issue_name', 'issueFixed', 'action_name', 'success']]

apply将字段session\u id注入到json中,然后json\u normalize拥有所有解析信息。你知道吗

我已经创建了2048行的测试性能数据帧。在我的笔记本电脑上,for循环用了8.92秒,而apply+json\u normalize用了387+167ms。 看起来注入json要快得多。你知道吗


Tags: nameactionsidjsonfalsetruedfsession
1条回答
网友
1楼 · 发布于 2024-09-30 16:21:17

我能做的最好的是这个-仍然使用循环,但肯定会节省你一些时间

def joiner(x):
    a = pd.DataFrame(json_normalize(json.loads(x[1])['issues'], record_path=['actions'], 
        meta=['issue_id','issue_name','issueFixed'],errors='ignore'))
    return pd.concat([pd.DataFrame({'Session':[x[0]] * a.shape[0]}), a], axis=1)

pd.concat([joiner(x[1] )for x in df.iterrows()])

相关问题 更多 >