将DF导出为嵌套JSON(多重嵌套)

2024-10-06 13:49:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将Pandas df导出到嵌套的JSON,以便在Mongodb中摄取

下面是一个数据示例:

data = {
    'product_id': ['a001','a001','a001'],
    'product': ['aluminium','aluminium','aluminium'],
    'production_id': ['b001','b002','b002'],
    'production_name': ['metallurgical','recycle','recycle'],
    'geo_name': ['US','EU','RoW'],
    'value': [100, 200 ,200]
}
df = pd.DataFrame(data=data)
^{tb1}$

最后的JSON应该是这样的:

{
    "name_id": "a001",
    "name": "aluminium",
    "activities": [
        {
            "product_id": "b001"
            "product_name": "metallurgical",
            "regions": [
                {
                    "geo_name": "US",
                    "value": 100
                }
            ]
        },
        {
            "product_id": "b002"
            "product_name": "recycle",
            "regions": [
                {
                    "geo_name": "EU",
                    "value": 200
                },
                {
                    "geo_name": "RoW",
                    "value": 200
                }
            ]
        }
    ]
}

有一些问题与我的问题很接近,但它们要么已经存在多年,并且引用了解决方案中断的较旧版本的Pandas,要么没有完全按照我希望的方式对json进行分组和嵌套(例如,这是单级How to create a nested JSON from pandas DataFrame?

如果能帮上点忙,我将不胜感激


Tags: nameidjsonpandasdfdatavalueproduct
1条回答
网友
1楼 · 发布于 2024-10-06 13:49:20

我找到了适用于无限嵌套数的最简单解决方案(本例中为2个):

json_extract = df\
    .groupby(['product_id','product', 'production_id','production_name'])\
    .apply(lambda x: x[['geo_name','value']].to_dict('records'))\
    .reset_index(name='geos')\
    .groupby(['product_id','product'])\
    .apply(lambda x: x[['production_id','production_name', 'geos']].to_dict('records'))\
    .reset_index(name='production')\
    .to_json(orient='records')
[
    {
        "product_id": "a001",
        "product": "aluminium",
        "production": [
            {
                "production_id": "b001",
                "production_name": "metallurgical",
                "geos": [
                    {
                        "geo_name": "US",
                        "value": 100
                    }
                ]
            },
            {
                "production_id": "b002",
                "production_name": "recycle",
                "geos": [
                    {
                        "geo_name": "EU",
                        "value": 200
                    },
                    {
                        "geo_name": "RoW",
                        "value": 200
                    }
                ]
            }
        ]
    }
]

相关问题 更多 >