我有一个数据帧:
df = pd.DataFrame([
[1, '{"issues": [{"issue_name": "fixed.issue.cpeUnreachable", "issue_id": "52446*", "actions": [], "issueFixed": "true"}, {"issue_name": "fixed.issue.internet.cgnat.statusActive", "issueFixed": "false", "issue_id": "8834*4", "actions": [ {"action_name": "cableCheck", "success": "false"}, {"action_name": "otherCheck", "success": "true"}]}, {"issue_name": "fixed.issue.rf.ds.quality", "issue_id": "3642*", "actions": [ {"action_name": "akcija 1", "success": "false"}, {"action_name": "akcija 2", "success": "false"}, {"action_name": "akcija 3", "success": "false"}, {"action_name": "akcija 4", "success": "false"}, {"action_name": "akcija 5", "success": "false"}], "issueFixed": "true"}, {"issue_name": "fixed.issue.rf.us.quality", "issueFixed": "false", "issue_id": "8834*3", "actions": []}, {"issue_name" : "rebootBeforeTicket", "actions" : [{"action_name": "rebootCpeDevice", "success" : "false"}, {"action_name": "rebootStbDevice", "success" : "true"}]} ]}'],
[2, '{"issues": [{"issue_name": "fixed.issue.cpeUnreachable", "issue_id": "52446*", "actions": [], "issueFixed": "true"}, {"issue_name": "fixed.issue.internet.cgnat.statusActive", "issueFixed": "false", "issue_id": "8834*4", "actions": [ {"action_name": "cableCheck", "success": "false"}, {"action_name": "otherCheck", "success": "true"}]}, {"issue_name": "fixed.issue.rf.ds.quality", "issue_id": "3642*", "actions": [ {"action_name": "akcija 1", "success": "false"}, {"action_name": "akcija 2", "success": "false"}, {"action_name": "akcija 3", "success": "false"}, {"action_name": "akcija 4", "success": "false"}, {"action_name": "akcija 5", "success": "false"}], "issueFixed": "true"}, {"issue_name": "fixed.issue.rf.us.quality", "issueFixed": "false", "issue_id": "8834*3", "actions": []}, {"issue_name" : "rebootBeforeTicket", "actions" : [{"action_name": "rebootCpeDevice", "success" : "false"}, {"action_name": "rebootStbDevice", "success" : "true"}]} ]}']],
columns=['session_id', 'json_text'])
df
我想将此数据帧转换为:
到目前为止,我尝试了以下方法:
df1 = pd.DataFrame()
for idx, row in df.iterrows():
json_contents = json.loads(row.stat_dimen_value)
df_json = json_normalize(json_contents['issues'], record_path=['actions'], meta=['issue_id', 'issue_name', 'issueFixed'], errors='ignore')
df_json.insert(0, 'session_id', row.session_id)
df1 = pd.concat([df1, df_json])
df1 = df1[['session_id', 'issue_id', 'issue_name', 'issueFixed', 'action_name', 'success']]
它工作,但我不满意的for循环。 我不得不加入新包装的dfïujson(来自df.json\u文本字段)数据帧数据框会话\u id现场。因为我找不到其他方法,所以我使用for循环。你知道吗
有没有更好的方法将dfèjson与其数据框会话\u id(也可能是som其他df字段)不使用for循环的字段?你知道吗
敬礼。你知道吗
编辑1,使用json注入的解决方案:
json_ser = df.apply(lambda row: json.loads(row.stat_dimen_value[:1] + f'"session_id":{row.session_id}, ' + row.stat_dimen_value[1:]), axis=1)
json_ser.head()
df1 = json_normalize(json_ser, \
record_path=['issues', 'actions'], \
meta=['session_id', ['issues', 'issue_id'], ['issues', 'issue_name'], ['issues', 'issueFixed']], \
sep='_', \
errors='ignore') \
.rename(columns={'issues_issue_id' : 'issue_id', 'issues_issue_name' : 'issue_name', 'issues_issueFixed' : 'issueFixed'}) \
[['session_id', 'issue_id', 'issue_name', 'issueFixed', 'action_name', 'success']]
apply将字段session\u id注入到json中,然后json\u normalize拥有所有解析信息。你知道吗
我已经创建了2048行的测试性能数据帧。在我的笔记本电脑上,for循环用了8.92秒,而apply+json\u normalize用了387+167ms。 看起来注入json要快得多。你知道吗
我能做的最好的是这个-仍然使用循环,但肯定会节省你一些时间
相关问题 更多 >
编程相关推荐