如何展平嵌套json数组？

{ "data": { "workbooks": [ { "projectName": "TestProject", "name": "wkb1", "site": { "name": "site1" }, "description": "", "createdAt": "2020-12-13T15:38:58Z", "updatedAt": "2020-12-13T15:38:59Z", "owner": { "name": "user1", "username": "John" }, "embeddedDatasources": [ { "name": "DS1", "hasExtracts": false, "upstreamDatasources": [ { "projectName": "Data Sources", "name": "DS1", "hasExtracts": false, "owner": { "username": "user2" } } ], "upstreamTables": [ { "name": "table_1", "schema": "schema_1", "database": { "name": "testdb", "connectionType": "redshift" } }, { "name": "table_2", "schema": "schema_2", "database": { "name": "testdb", "connectionType": "redshift" } }, { "name": "table_3", "schema": "schema_3", "database": { "name": "testdb", "connectionType": "redshift" } } ] }, { "name": "DS2", "hasExtracts": false, "upstreamDatasources": [ { "projectName": "Data Sources", "name": "DS2", "hasExtracts": false, "owner": { "username": "user3" } } ], "upstreamTables": [ { "name": "table_4", "schema": "schema_1", "database": { "name": "testdb", "connectionType": "redshift" } } ] } ] } ] } }

1条回答

网友

1楼 · 发布于 2024-06-28 18:48:32

这里有一个部分解决方案：

首先将数据保存在与脚本相同的目录中，作为名为data.json的JSON file

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('data.json') as json_file:
    json_data = json.load(json_file)

new_data = json_data['data']['workbooks']

result = json_normalize(new_data, ['embeddedDatasources', 'upstreamTables'], ['projectName', 'name', 'createdAt', 'updatedAt', 'owner', 'site'], record_prefix='_')

result

输出：

^{tb1}$

下一步是什么？

我认为如果您提前一点重新构造数据（例如展平'database': {'name': 'testdb', 'connectionType': 'redshift'}），您将能够向meta参数添加更多的fields

正如您在json_normalize的documentation中所看到的，这里使用的四个参数是：

数据：dict or list of dicts：
- 未序列化的JSON对象
记录路径：str or list of str：默认无
- 每个对象中指向记录列表的路径。如果未传递，则假定数据是一个记录数组
meta:list of paths (str or list of str)：默认无
- 要用作结果表中每个记录的元数据的字段
记录前缀：str：默认无
- 如果为True，则为记录加上带点（？）路径的前缀，例如，如果记录的路径为['foo'，'bar']，则为foo.bar.field

相关问题更多 >

编程相关推荐

热门问题

热门文章