如何将数据帧转换为带有标题的多级JSON？

#Create Input Dataframe data = { 'col6':['A','A','A','B','B','B'], 'col7':[1, 1, 2, 1, 2, 2], 'col8':['A','A','A','B','B','B'], 'col10':['A','A','A','B','B','B'], 'col14':[1,1,1,1,1,2], 'col15':[1,2,1,1,1,1], 'col16':[9,10,26,9,12,4], 'col18':[1,1,2,1,2,3], 'col1':['xxxx','xxxx','xxxx','xxxx','xxxx','xxxx'], 'col2':[2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13], 'col3':['xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012'], 'col4':['yyyy','yyyy','yyyy','yyyy','yyyy','yyyy'], 'col5':[0,0,0,0,0,0], 'col9':['A','A','A','B','B','B'], 'col11':[0,0,0,0,0,0], 'col12':[0,0,0,0,0,0], 'col13':[0,0,0,0,0,0], 'col17':[51,63,47,59,53,56] } pd.DataFrame(data)

{ "header1": { "col1": "xxxx" "col2": "20201107023012" "col3": "xxxx20201107023012" "col4": "yyyy", "col5": "0" }, "header2": { "header3": [ { col6: A, col7: 1, header4: [ { col8: "A", col9: 1, col10: "A", col11: 0, col12: 0, col13: 0, "header5": [ { col14: "1", col15: 1, col16: 1, col17: 51, col18: 1 }, { col14: "1", col15: 1, col16: 2, col17: 63, col18: 2 } ] }, { col8: "A", col9: 1, col10: "A", col11: 0, col12: 0, col13: 0, "header5": [ { col14: "1", col15: 1, col16: 1, col17: 51, col18: 1 }, { col14: "1", col15: 1, col16: 2, col17: 63, col18: 2 } ] } ] } ] } }

1条回答

网友

1楼 · 发布于 2024-10-01 02:21:29

也许这会让你开始。我不知道当前有什么python模块可以满足您的需求，但这是我启动它的基础。根据您提供的内容做出假设

由于每个连续嵌套都基于某些条件，因此需要循环过滤数据帧。根据数据帧的大小，使用groupby可能是比我这里介绍的更好的选择，但理论是一样的。此外，您还必须正确地创建键值对，这只是创建了对您正在构建的数据的支持

    # assume header 1 is constant so take first row and use .T to transpose to create dictionaries
header1 = dict(df.iloc[0].T[['col1','col2','col3','col4','col5']])
print('header1', header1)
# for header three, looks like you need the unique combinations so create dataframe 
# and then iterate through to get all the header3 dictionaries
header3_dicts = []
dfh3 = df[['col6', 'col7']].drop_duplicates().reset_index(drop=True)
for i in range(dfh3.shape[0]):
    header3_dicts.append(dict(dfh3.iloc[i].T[['col6','col7']]))
    print('header3', header3_dicts)
    # iterate over header3 to get header 4
    for i in range(dfh3.shape[0]):
        #print(dfh3.iat[i,0], dfh3.iat[i,1])
        dfh4 = df.loc[(df['col6']==dfh3.iat[i,0]) & (df['col7']==dfh3.iat[i,1])]
        header4_dicts = []
        for j in range(dfh4.shape[0]):
            header4_dicts.append(dict(df.iloc[j].T[['col8','col9','col10','col11','col12','col13']]))
        print('header4', header4_dicts)
        # next level repeat similar to above

相关问题更多 >

编程相关推荐

热门问题

热门文章