按组将数据帧输出格式化为JSON记录

2024-06-14 03:55:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据帧看起来像这样:

       count_arena_users  count_users                    event   timestamp
0                   4458        12499                 football  2017-04-30
1                   2706         4605                  cricket  2015-06-30
2                    592         4176                   tennis  2016-06-30
3                   3427        10126                badminton  2017-05-31
4                    717         2313                 football  2016-03-31
5                    101          155                   hockey  2016-01-31
6                  45923       191180                   tennis  2015-12-31
7                   1208         2824                badminton  2017-01-31
8                   5577         8906                  cricket  2016-02-29
9                    111          205                 football  2016-03-31
10                     4            8                   hockey  2017-09-30

数据是从psql数据库获取的,现在我想以json格式生成"select * from tbl_arena"的输出。但所需的json格式必须如下所示:

[
  {
    "event": "football",
    "data_to_plot": [
      {
        "count_arena_users": 717,
        "count_users": 2313,
        "timestamp": "2016-03-31"
      },
      {
        "count_arena_users": 111,
        "count_users": 205,
        "timestamp": "2016-03-31"
      },
      {
        "count_arena_users": 4458,
        "count_users": 12499,
        "timestamp": "2017-04-30"
      }
    ]
  },
  {
    "event": "cricket",
    "data_to_plot": [
      {
        "count_arena_users": 2706,
        "count_users": 4605,
        "timestamp": "2015-06-30"
      },
      {
        "count_arena_users": 5577,
        "count_users": 8906,
        "timestamp": "2016-02-29"
      }
    ]
  }
.
.
.
.
]

所有列的值都根据event列进行分组,随后子字典的出现顺序根据timestamp列确定,即较早的日期最先出现,较新的/最新的日期出现在其下方

我使用python3.x和json.dumps将数据格式化为json样式


Tags: to数据eventjsondata格式countusers
1条回答
网友
1楼 · 发布于 2024-06-14 03:55:05

一个高层次的过程如下-

  1. 聚合有关events的所有数据。我们需要一个groupby+apply
  2. 将结果转换为一系列记录,每个事件一条记录和相关数据。使用to_json,和orient=records

df.groupby('event', sort=False)\
  .apply(lambda x: x.drop('event', 1).sort_values('timestamp').to_dict('r'))\
  .reset_index(name='data_to_plot')\
  .to_json(orient='records')

[
  {
    "event": "football",
    "data_to_plot": [
      {
        "count_arena_users": 717,
        "timestamp": "2016-03-31",
        "count_users": 2313
      },
      {
        "count_arena_users": 111,
        "timestamp": "2016-03-31",
        "count_users": 205
      },
      {
        "count_arena_users": 4458,
        "timestamp": "2017-04-30",
        "count_users": 12499
      }
    ]
  },
  ...
]

相关问题 更多 >