从Python上的几个JSON文件创建数据帧

2024-07-02 13:56:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我必须从一系列JSON文件中创建一个数据帧。这就是我到目前为止的一些背景

#Importing helper libraries
import sys
import json


from helpers.helper_functions import execute_bigquery

#importing standard libraries
import requests

#get data from bigquery
authors_df = execute_bigquery(f"""
    SELECT author
    FROM `XXX`
    LIMIT 1000
    """)

#for each row
for index, row in authors_df.iterrows():
    #get the author
    author = row['author']

基本上,author是一个包含1000个我想从中收集数据的id的列表(例如1232456093273,等等)

我想要这些作者的信息可以从一个链接中获得,这个链接会根据作者的不同而变化

    #build the url
    url = f'http://keystone-db.default.svc.cluster.local:5000/keystonedb/profiles/resonance/categorization?profileId={author}&regionId=1'    

    #get the json value
    json_value = requests.get(url).json()

    #display it
    print(json.dumps(json_value['resonanceCategorizations']['1']['fullData'], indent=2))

以下是前两位作者"45866207""54502344"的部分输出:

45866207
[
  {
    "seed": 24868793,
    "globalSegmentId": 26895,
    "globalSegmentName": "Luxury Accessories & Jewellery",
    "regionId": 15,
    "resonance": 0.8028571009635925,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 76611584,
    "globalSegmentId": 17899,
    "globalSegmentName": "Jewellery",
    "regionId": 15,
    "resonance": 0.8028001189231873,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 40893487,
    "globalSegmentId": 17899,
    "globalSegmentName": "Jewellery",
    "regionId": 15,
    "resonance": 0.7982199192047119,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 74701069,
    "globalSegmentId": 17912,
    "globalSegmentName": "Heritage Designer Brands",
    "regionId": 15,
    "resonance": 0.6809910535812378,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 936905156,
    "globalSegmentId": 17899,
    "globalSegmentName": "Jewellery",
    "regionId": 15,
    "resonance": 0.6566575169563293,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 14831515,
    "globalSegmentId": 17801,
    "globalSegmentName": "Mining & Resources",
    "regionId": 1,
    "resonance": 0.6080579161643982,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 36544806,
    "globalSegmentId": 18392,
    "globalSegmentName": "Rugby",
    "regionId": 12,
    "resonance": 0.5898635983467102,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 26494583,
    "globalSegmentId": 26895,
    "globalSegmentName": "Luxury Accessories & Jewellery",
    "regionId": 15,
    "resonance": 0.5888025760650635,
    "isGlobal": true,
    "globalRegion": 1


    }
    ]
54502344
[
  {
    "seed": 255420441,
    "globalSegmentId": 18187,
    "globalSegmentName": "Luxury Cars",
    "regionId": 18,
    "resonance": 0.9264420866966248,
    "isGlobal": true,
    "globalRegion": 1
  },
  {
    "seed": 2650413864,
    "globalSegmentId": 18187,
    "globalSegmentName": "Luxury Cars",
    "regionId": 18,
    "resonance": 0.9237868189811707,
    "isGlobal": true,
    "globalRegion": 1
  },
  ...

名单上的其他作者也一样

我想要获得的是一种方法,为JSON列表的第一个元素中的每个author变量、列表的第二个元素中的所有变量和第三个元素中的所有变量提取变量,并将它们放入一个有1000行的数据集中(每个author一个)

这是我想要的输出(1000行对应1000个作者和21个变量:7个变量或列表中前3个元素的“键”):

     Author     seed_1     GlobalSegmentId_1 ... seed_2     GlobalSegmentId_2 .... seed_3 ... globalregion_3      
     45866207  24868793    26895                 76611584     17899    .....
     54502344  255420441    ....   .....
      ....    ....

Tags: importjsontrue列表get作者authorseed