如何展平嵌套json数组?

2024-06-28 18:48:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要用Python中不同级别的嵌套JSON数组将JSON展平

我的JSON的一部分看起来像:

{
  "data": {
    "workbooks": [
      {
        "projectName": "TestProject",
        "name": "wkb1",
        "site": {
          "name": "site1"
        },
        "description": "",
        "createdAt": "2020-12-13T15:38:58Z",
        "updatedAt": "2020-12-13T15:38:59Z",
        "owner": {
          "name": "user1",
          "username": "John"
        },
        "embeddedDatasources": [
          {
            "name": "DS1",
            "hasExtracts": false,
            "upstreamDatasources": [
              {
                "projectName": "Data Sources",
                "name": "DS1",
                "hasExtracts": false,
                "owner": {
                  "username": "user2"
                }
              }
            ],
            "upstreamTables": [
              {
                "name": "table_1",
                "schema": "schema_1",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              },
              {
                "name": "table_2",
                "schema": "schema_2",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              },
              {
                "name": "table_3",
                "schema": "schema_3",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              }
            ]
          },
          {
            "name": "DS2",
            "hasExtracts": false,
            "upstreamDatasources": [
              {
                "projectName": "Data Sources",
                "name": "DS2",
                "hasExtracts": false,
                "owner": {
                  "username": "user3"
                }
              }
            ],
            "upstreamTables": [
              {
                "name": "table_4",
                "schema": "schema_1",
                "database": {
                  "name": "testdb",
                  "connectionType": "redshift"
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

输出应该是这样的

sample output

尝试使用json_normalize,但无法使其工作。当前通过使用循环读取嵌套数组和使用键读取值来解析它。正在寻找规范JSON的更好方法


Tags: namejsonfalseredshiftschematableusername数组
1条回答
网友
1楼 · 发布于 2024-06-28 18:48:32

这里有一个部分解决方案:

首先将数据保存在与脚本相同的目录中,作为名为data.jsonJSON file

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('data.json') as json_file:
    json_data = json.load(json_file)

new_data = json_data['data']['workbooks']

result = json_normalize(new_data, ['embeddedDatasources', 'upstreamTables'], ['projectName', 'name', 'createdAt', 'updatedAt', 'owner', 'site'], record_prefix='_')

result 

输出:

^{tb1}$

下一步是什么?

我认为如果您提前一点重新构造数据(例如展平'database': {'name': 'testdb', 'connectionType': 'redshift'}),您将能够向meta参数添加更多的fields

正如您在json_normalize的documentation中所看到的,这里使用的四个参数是:

  1. 数据:dict or list of dicts

    • 未序列化的JSON对象
  2. 记录路径:str or list of str:默认无

    • 每个对象中指向记录列表的路径。如果未传递,则假定数据是一个记录数组
  3. meta:list of paths (str or list of str):默认无

    • 要用作结果表中每个记录的元数据的字段
  4. 记录前缀:str:默认无

    • 如果为True,则为记录加上带点(?)路径的前缀,例如,如果记录的路径为['foo','bar'],则为foo.bar.field

相关问题 更多 >