我必须从一系列JSON文件中创建一个数据帧。这就是我到目前为止的一些背景
#Importing helper libraries
import sys
import json
from helpers.helper_functions import execute_bigquery
#importing standard libraries
import requests
#get data from bigquery
authors_df = execute_bigquery(f"""
SELECT author
FROM `XXX`
LIMIT 1000
""")
#for each row
for index, row in authors_df.iterrows():
#get the author
author = row['author']
基本上,author是一个包含1000个我想从中收集数据的id的列表(例如1232
、456093
、273
,等等)
我想要这些作者的信息可以从一个链接中获得,这个链接会根据作者的不同而变化
#build the url
url = f'http://keystone-db.default.svc.cluster.local:5000/keystonedb/profiles/resonance/categorization?profileId={author}®ionId=1'
#get the json value
json_value = requests.get(url).json()
#display it
print(json.dumps(json_value['resonanceCategorizations']['1']['fullData'], indent=2))
以下是前两位作者"45866207"
和"54502344"
的部分输出:
45866207
[
{
"seed": 24868793,
"globalSegmentId": 26895,
"globalSegmentName": "Luxury Accessories & Jewellery",
"regionId": 15,
"resonance": 0.8028571009635925,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 76611584,
"globalSegmentId": 17899,
"globalSegmentName": "Jewellery",
"regionId": 15,
"resonance": 0.8028001189231873,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 40893487,
"globalSegmentId": 17899,
"globalSegmentName": "Jewellery",
"regionId": 15,
"resonance": 0.7982199192047119,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 74701069,
"globalSegmentId": 17912,
"globalSegmentName": "Heritage Designer Brands",
"regionId": 15,
"resonance": 0.6809910535812378,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 936905156,
"globalSegmentId": 17899,
"globalSegmentName": "Jewellery",
"regionId": 15,
"resonance": 0.6566575169563293,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 14831515,
"globalSegmentId": 17801,
"globalSegmentName": "Mining & Resources",
"regionId": 1,
"resonance": 0.6080579161643982,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 36544806,
"globalSegmentId": 18392,
"globalSegmentName": "Rugby",
"regionId": 12,
"resonance": 0.5898635983467102,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 26494583,
"globalSegmentId": 26895,
"globalSegmentName": "Luxury Accessories & Jewellery",
"regionId": 15,
"resonance": 0.5888025760650635,
"isGlobal": true,
"globalRegion": 1
}
]
54502344
[
{
"seed": 255420441,
"globalSegmentId": 18187,
"globalSegmentName": "Luxury Cars",
"regionId": 18,
"resonance": 0.9264420866966248,
"isGlobal": true,
"globalRegion": 1
},
{
"seed": 2650413864,
"globalSegmentId": 18187,
"globalSegmentName": "Luxury Cars",
"regionId": 18,
"resonance": 0.9237868189811707,
"isGlobal": true,
"globalRegion": 1
},
...
名单上的其他作者也一样
我想要获得的是一种方法,为JSON列表的第一个元素中的每个author变量、列表的第二个元素中的所有变量和第三个元素中的所有变量提取变量,并将它们放入一个有1000行的数据集中(每个author一个)
这是我想要的输出(1000行对应1000个作者和21个变量:7个变量或列表中前3个元素的“键”):
Author seed_1 GlobalSegmentId_1 ... seed_2 GlobalSegmentId_2 .... seed_3 ... globalregion_3
45866207 24868793 26895 76611584 17899 .....
54502344 255420441 .... .....
.... ....
目前没有回答
相关问题 更多 >
编程相关推荐