Pandas:如何用多个嵌套的JSON列表规范化JSON文件?

2024-09-27 19:27:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我从API请求数据,然后尝试规范化这个JSON文件,它有这样的结构

[{'la_id': '33',
  'store': '1405fdsa6001209',
  'sell': '110aa346',
  'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
   {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
   {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
   {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
  'payment': {'paymentid': '10a836',
   'value': '259.6000',
   'number': '4',
   'finalid': '4',
   'finalname': 'Cartao de credito',
   'docs': '849763',
   'flag': None}}
   'pagamentos': [{'pagamento_id': '107795',
   'valor': '854.9900',
   'numero_parcelas': '10',
   'finalizador_id': '4',
   'finalizador_nome': 'Cartao de credito',
   'documento': '500003',
   'bandeira': 'MASTERCARD'}]

当我应用JsonNormalize时,为了将其转换为数据帧,我得到以下结果:

^{tb1}$

如您所见,最后两列没有正确获取值,它们在列表中有字典。我怎样才能解决这个问题


Tags: 文件数据apiidjsonde规范化结构
2条回答

您可以对以下各项使用^{}

  1. 提取主字段(包括键la_id

  2. 提取products详细信息+键la_id

  3. 提取pagamentos详细信息+键la_id

然后,使用^{}使用公共键la_id合并3个结果数据帧,如下所示:

j_lst = [{'la_id': '33',
          'store': '1405fdsa6001209',
          'sell': '110aa346',
          'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
                       {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
                       {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
                       {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
          'payment': {'paymentid': '10a836',
                      'value': '259.6000',
                      'number': '4',
                      'finalid': '4',
                      'finalname': 'Cartao de credito',
                      'docs': '849763',
                      'flag': None},
          'pagamentos': [{'pagamento_id': '107795',
                          'valor': '854.9900',
                          'numero_parcelas': '10',
                          'finalizador_id': '4',
                          'finalizador_nome': 'Cartao de credito',
                          'documento': '500003',
                          'bandeira': 'MASTERCARD'}]}]


df_main = pd.json_normalize(j_lst)

df_products = pd.json_normalize(j_lst, record_path=['products'], record_prefix='products.', meta=['la_id'])

df_pagamentos = pd.json_normalize(j_lst, record_path=['pagamentos'], record_prefix='pagamentos.', meta=['la_id'])

df_out = (df_main.merge(df_products, on='la_id')
                 .merge(df_pagamentos, on='la_id')
                 .drop(['products', 'pagamentos'], axis=1)
         )

结果:

print(df_out)

  la_id            store      sell payment.paymentid payment.value payment.number payment.finalid  payment.finalname payment.docs payment.flag products.codigo products.lacre products.valor pagamentos.pagamento_id pagamentos.valor pagamentos.numero_parcelas pagamentos.finalizador_id pagamentos.finalizador_nome pagamentos.documento pagamentos.bandeira
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          176690    15980fd2293          49.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
1    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          sd4907     1598a12385          19.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
2    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          aa4907      1598a2384          19.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
3    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          1fd307     1598a20401         169.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD

尝试:

lst = [
    {
        "la_id": "33",
        "store": "1405fdsa6001209",
        "sell": "110aa346",
        "products": [
            {"codigo": "176690", "lacre": "15980fd2293", "valor": "49.90"},
            {"codigo": "sd4907", "lacre": "1598a12385", "valor": "19.90"},
            {"codigo": "aa4907", "lacre": "1598a2384", "valor": "19.90"},
            {"codigo": "1fd307", "lacre": "1598a20401", "valor": "169.90"},
        ],
        "payment": {
            "paymentid": "10a836",
            "value": "259.6000",
            "number": "4",
            "finalid": "4",
            "finalname": "Cartao de credito",
            "docs": "849763",
            "flag": None,
        },
    }
]

df = pd.json_normalize(lst).explode("products")
df = pd.concat([df, df.pop("products").apply(pd.Series)], axis=1)
print(df)

印刷品:

  la_id            store      sell payment.paymentid payment.value payment.number payment.finalid  payment.finalname payment.docs payment.flag  codigo        lacre   valor
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  176690  15980fd2293   49.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  sd4907   1598a12385   19.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  aa4907    1598a2384   19.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  1fd307   1598a20401  169.90

编辑:使用更新的输入:

df = pd.concat([df, df.pop("payments").apply(pd.Series)], axis=1)
df = df.explode("product")
df = pd.concat([df, df.pop("product").apply(pd.Series)], axis=1)
print(df)

印刷品:

   id            store      sell payment_id    valor number finalid   finalizador_nome    docs        flag  codigo        lacre   valor
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  176690  15980fd2293   49.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  sd4907   1598a12385   19.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  aa4907    1598a2384   19.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  1fd307   1598a20401  169.90

相关问题 更多 >

    热门问题