如何从pandas中的行创建分层json?

2024-09-29 22:30:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个这样的数据框

values = [
    [
        1, 1, 'Standard Room', 'adult', 'John Doe'
    ],
    [
        1, 1, 'Standard Room', 'adult', 'John Noakes'
    ],
    [
        1, 2, 'Standard Room', 'adult', 'Richard Miles'
    ],
    [
        2, 1, 'Standard Room', 'adult', 'John Smith'
    ],
    [
        2, 1, 'Standard Room', 'adult', 'Joe Public'
    ]
]

columns = [
    'booking_id',
    'room_id',
    'room_name',
    'guest_type',
    'guest_name'
]

df = pd.DataFrame.from_records(values, columns=columns)
print(df)

   booking_id  room_id      room_name guest_type     guest_name
0           1        1  Standard Room      adult       John Doe
1           1        1  Standard Room      adult    John Noakes
2           1        2  Standard Room      adult  Richard Miles
3           2        1  Standard Room      adult     John Smith
4           2        1  Standard Room      adult     Joe Public

我只想为每个booking_id创建一个json。json应该包含room_idguest_typeguest_name字段。我一直在尝试这样的事情

df.groupby(['booking_id', 'room_id', 'room_name']).apply(lambda x: x.to_dict(orient='records'))

但在这种情况下,存在一些重复的值。例如,字典中有两个相同的room_id

[{'booking_id': 1,
  'room_id': 1,
  'room_name': 'Standard Room',
  'guest_type': 'adult',
  'guest_name': 'John Doe'},
 {'booking_id': 1,
  'room_id': 1,
  'room_name': 'Standard Room',
  'guest_type': 'adult',
  'guest_name': 'John Noakes'}]

但是,正如我前面所说的,我想为每个booking_id创建层次结构的json。这是预期产出

output_values = [
    [1, {
    'rooms': [
                {
                    'room_id': 1,
                    'room_name': 'Standard Room',
                    'guests': [
                        {
                            'guest_type': 'adult',
                            'guest_name': 'John Doe'
                        },
                        {
                            'guest_type': 'adult',
                            'guest_name': 'John Noakes'
                        }
                    ]
                },
                {
                    'room_id': 2,
                    'room_name': 'Standard Room',
                    'guests': [
                        {
                            'guest_type': 'adult',
                            'guest_name': 'Richard Miles'
                        }
                    ]
                }
            ]
        }
    ],
    [2, {
            'rooms': [
                {
                    'room_id': 1,
                    'room_name': 'Standard Room',
                    'guests': [
                        {
                            'guest_type': 'adult',
                            'guest_name': 'John Smith'
                        },
                        {
                            'guest_type': 'adult',
                            'guest_name': 'Joe Public'
                        }
                    ]
                }
            ]
        }
    ]
]

print(pd.DataFrame.from_records(output_values, columns=['booking_id', 'rooms']))

   booking_id                                              rooms
0           1  {'rooms': [{'room_id': 1, 'room_name': 'Standa...
1           2  {'rooms': [{'room_id': 1, 'room_name': 'Standa...

有什么想法吗?提前谢谢


Tags: columnsnameidtypejohnstandardroomvalues
2条回答

需要进行大量的整形,包括多个groupbyapplyzip

s = (df.groupby(["booking_id", "room_id", "room_name"])
       .apply(lambda d: pd.Series({"guests": [{k: v for k,v in zip(["guest_type", "guest_name"], i)}
                                              for i in d[["guest_type", "guest_name"]].values]}))
       .reset_index(["room_id", "room_name"]) )

res = [[g, {'rooms': i.to_dict("records")}] for g, i in s.groupby(level=0)]

import pprint

pprint.pprint(res)

[[1,
  {'rooms': [{'guests': [{'guest_name': 'John Doe', 
                          'guest_type': 'adult'},
                         {'guest_name': 'John Noakes',
                          'guest_type': 'adult'}],
              'room_id': 1,
              'room_name': 'Standard Room'},
             {'guests': [{'guest_name': 'Richard Miles',
                          'guest_type': 'adult'}],
              'room_id': 2,
              'room_name': 'Standard Room'}]}],
 [2,
  {'rooms': [{'guests': [{'guest_name': 'John Smith', 
                          'guest_type': 'adult'},
                         {'guest_name': 'Joe Public',
                          'guest_type': 'adult'}],
              'room_id': 1,
              'room_name': 'Standard Room'}]}]]

另一个非冷却迭代的解决方案:

d = [{ int(df.loc[i]['booking_id']): {
    'rooms': [
      {
        'room_id' : int(room_id),
        'room_name' : df.loc[i]['room_name'],
        'guests' : [
          {
            'guest_type': df.loc[g]['guest_type'],
            'guest_name': df.loc[g]['guest_name']
          }
          for g in df[(df['booking_id'] == df.loc[i]['booking_id']) & (df['room_id'] == room_id)].index]
      } for room_id in set(df[df['booking_id'] == df.loc[i]['booking_id']]['room_id'].values) ]
    } for i in df.index }]

result = json.dumps(d, indent=4)

print(result)

相关问题 更多 >

    热门问题