提取嵌套数据

2024-09-27 00:22:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据结构

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 EMP_ID     ¦ ATTR1_OLD_VAL ¦                                                                                        ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL ¦ ATTR3_NEW_VAL   ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+-------------------------------------------------------------------------------------------¦
¦      E001 ¦ xyz           ¦ [{"codeTs":"12345567 ","goodsAttrName":"test1"},{"codeTs":"6402910000","goodsAttrName":"test2"}]      ¦ mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
       E002 ¦ 1234          ¦                                                                                                       ¦ 123           ¦ jjj           ¦ iii           ¦ bb             ¦
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

我想把它处理成以下数据

+---------------------------------------------------------------------------------------------------------------------------------------------+
 EMP_ID     ¦ ATTR1_OLD_VAL ¦ codeTs     ¦  goodsAttrName  ¦ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL    ¦ ATTR3_NEW_VAL  ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+------------------------------------¦
¦      E001 ¦ xyz           ¦ 12345567   ¦  test1          ¦               ¦   mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
¦      E001 ¦ xyz           ¦ 6402910000 ¦  test2          ¦               ¦   mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
¦      E002 ¦ 1234          ¦            ¦                 ¦               ¦   123           ¦  jjj          ¦ iii           ¦ bb             ¦                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------+

我尝试了以下代码,但无效

import pymongo as pm
import pandas as pd
import numpy as np
import datetime as dt
from bson import json_util
from pandas.io.json import json_normalize
import json

client = pm.MongoClient('mongodb://user1:user1@127.0.0.1:27017')
db = client['my_db']
mongo_data = list(db['ATTR1_NEW_VAL'].find({}))
sanitized = json.loads(json_util.dumps(mongo_data))

normalized = json_normalize(sanitized)
df = pd.DataFrame(normalized)
print(df)

Tags: importjsonnewasvaloldmmmxyz
1条回答
网友
1楼 · 发布于 2024-09-27 00:22:11

试试这个:

首先使用df.explode将dict列表分解为行,然后通过apply(pd.Series)将dict值转换为列

df = df.explode('ATTR1_NEW_VAL')
print(pd.concat([df.drop(['ATTR1_NEW_VAL'], axis=1), df['ATTR1_NEW_VAL'].apply(pd.Series)], axis=1))

相关问题 更多 >

    热门问题