如何从具有不相关分支的嵌套JSON创建单独的数据帧或CSV?

2024-06-28 19:57:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有多个独立分支的嵌套JSON文件,这些分支只能通过分支顶部的信息连接。我不想交叉连接来自不同分支的行。分支机构内部可以有列表和词典,还可以有其他列表和词典

下面是一个示例json文件。我有35个不同结构的文件。我想为每个分支创建单独的平面文件,这些文件将存储在单独的文件夹中。稍后,将处理和查询这些文件中的数据

"Shipment": {
    "ActualShipmentDate": "2020-03-22",
    "EnterpriseCode": "US001",
    "EventType": "CONFIRM_SHIPMENT",
    "ShipmentNo": "1001816",
    "Status": "1444",
    "OrderDates": {
        "OrderDate": [{
                "ActualDate": "2019-08-01",
                "DateTypeId": "PROMISE_DATE",
                "OrderHeaderKey": "416734325",
                "OrderLineKey": "123416734326",
                "OrderReleaseKey": "",
                "Extn": {
                    "Ext": [{
                            "a": 1,
                            "b": 2,
                            "c": 3
                        }, {
                            "a": 8,
                            "b": 9
                        }
                    ]
                }
            }, {
                "ActualDate": "2020-03-22",
                "CommittedDate": "2020-03-22",
                "DateTypeId": "SHIPPED_OR_CANCELLED",
                "OrderHeaderKey": "416734325",
                "OrderLineKey": "123416734326",
                "OrderReleaseKey": " ",
                "RequestedDate": "2020-03-22"
            }
        ]
    },
    "ShipDates": {
        "ShipDate": [{
                "ActualDate": "2019-08-01",
                "DateTypeId": "PROMISE_DATE",
                "OrderHeaderKey": "416734325",
                "OrderLineKey": "123416734326",
                "Entn": {
                    "Ext": [{
                            "p": 1,
                            "q": 2,
                        }, {
                            "p": 9,
                        }
                    ]
                }
            }, {
                "ActualDate": "2020-03-22",
                "CommittedDate": "2020-03-22",
                "DateTypeId": "SHIPPED_OR_CANCELLED",
                "OrderHeaderKey": "416734325",
                "OrderLineKey": "123416734326",
            }
        ]
    }
}

上图为上述示例json文件的树结构: this image

如何在python中获得单独的结构,如下图所示: this image

我试图在AWS Lambda函数或胶水作业中实现这一点

非常感谢你的帮助


Tags: 文件json示例列表date分支结构ext
1条回答
网友
1楼 · 发布于 2024-06-28 19:57:08

您可以尝试(但根据需要重命名最终列):

data = {
    "Shipment": {
        "ActualShipmentDate": "2020-03-22",
        "EnterpriseCode": "US001",
        "EventType": "CONFIRM_SHIPMENT",
        "ShipmentNo": "1001816",
        "Status": "1444",
        "OrderDates": {
            "OrderDate": [
                {
                    "ActualDate": "2019-08-01",
                    "DateTypeId": "PROMISE_DATE",
                    "OrderHeaderKey": "416734325",
                    "OrderLineKey": "123416734326",
                    "OrderReleaseKey": "",
                    "Extn": {
                        "Ext": [{"a": 1, "b": 2, "c": 3}, {"a": 8, "b": 9}]
                    },
                },
                {
                    "ActualDate": "2020-03-22",
                    "CommittedDate": "2020-03-22",
                    "DateTypeId": "SHIPPED_OR_CANCELLED",
                    "OrderHeaderKey": "416734325",
                    "OrderLineKey": "123416734326",
                    "OrderReleaseKey": " ",
                    "RequestedDate": "2020-03-22",
                },
            ]
        },
        "ShipDates": {
            "ShipDate": [
                {
                    "ActualDate": "2019-08-01",
                    "DateTypeId": "PROMISE_DATE",
                    "OrderHeaderKey": "416734325",
                    "OrderLineKey": "123416734326",
                    "Entn": {
                        "Ext": [
                            {
                                "p": 1,
                                "q": 2,
                            },
                            {
                                "p": 9,
                            },
                        ]
                    },
                },
                {
                    "ActualDate": "2020-03-22",
                    "CommittedDate": "2020-03-22",
                    "DateTypeId": "SHIPPED_OR_CANCELLED",
                    "OrderHeaderKey": "416734325",
                    "OrderLineKey": "123416734326",
                },
            ]
        },
    }
}

df1 = pd.json_normalize(data["Shipment"]["OrderDates"]["OrderDate"])
df2 = pd.json_normalize(data["Shipment"]["ShipDates"]["ShipDate"])

for i, col in enumerate(
    [
        "ActualShipmentDate",
        "EnterpriseCode",
        "EventType",
        "ShipmentNo",
        "Status",
    ]
):
    df1.insert(i, col, data["Shipment"][col])
    df2.insert(i, col, data["Shipment"][col])


df1 = df1.explode("Extn.Ext").reset_index(drop=True)
tmp = df1.pop("Extn.Ext")
df1 = pd.concat(
    [df1, tmp[tmp.notna()].reset_index(drop=True).apply(pd.Series)], axis=1
)

df2 = df2.explode("Entn.Ext").reset_index(drop=True)
tmp = df2.pop("Entn.Ext")
df2 = pd.concat(
    [df2, tmp[tmp.notna()].reset_index(drop=True).apply(pd.Series)], axis=1
)

print(df1)
print(df2)

印刷品:

  ActualShipmentDate EnterpriseCode         EventType ShipmentNo Status  ActualDate            DateTypeId OrderHeaderKey  OrderLineKey OrderReleaseKey CommittedDate RequestedDate    a    b    c
0         2020-03-22          US001  CONFIRM_SHIPMENT    1001816   1444  2019-08-01          PROMISE_DATE      416734325  123416734326                           NaN           NaN  1.0  2.0  3.0
1         2020-03-22          US001  CONFIRM_SHIPMENT    1001816   1444  2019-08-01          PROMISE_DATE      416734325  123416734326                           NaN           NaN  8.0  9.0  NaN
2         2020-03-22          US001  CONFIRM_SHIPMENT    1001816   1444  2020-03-22  SHIPPED_OR_CANCELLED      416734325  123416734326                    2020-03-22    2020-03-22  NaN  NaN  NaN

  ActualShipmentDate EnterpriseCode         EventType ShipmentNo Status  ActualDate            DateTypeId OrderHeaderKey  OrderLineKey CommittedDate    p    q
0         2020-03-22          US001  CONFIRM_SHIPMENT    1001816   1444  2019-08-01          PROMISE_DATE      416734325  123416734326           NaN  1.0  2.0
1         2020-03-22          US001  CONFIRM_SHIPMENT    1001816   1444  2019-08-01          PROMISE_DATE      416734325  123416734326           NaN  9.0  NaN
2         2020-03-22          US001  CONFIRM_SHIPMENT    1001816   1444  2020-03-22  SHIPPED_OR_CANCELLED      416734325  123416734326    2020-03-22  NaN  NaN

相关问题 更多 >