从嵌套的json-fi将新列展平并构造成Pandas-df

data = {'publications': [{'title': 'The effect of land‐use changes on the hydrological behaviour of Histic Andosols in south Ecuador', 'author_affiliations': [[{'first_name': 'W.', 'last_name': 'Buytaert', 'researcher_id': 'ur.01136506420.02', 'affiliations': [{'id': 'grid.442123.2', 'name': 'University of Cuenca', 'org_types': ['Education'], 'city': 'Cuenca', 'city_id': 3658666, 'country': 'Ecuador', 'country_code': 'EC', 'state': None, 'state_code': None}, {'id': 'grid.5596.f', 'name': 'KU Leuven', 'org_types': ['Education'], 'city': 'Leuven', 'city_id': 2792482, 'country': 'Belgium', 'country_code': 'BE', 'state': None, 'state_code': None}]}, {'first_name': 'G.', 'last_name': 'Wyseure', 'researcher_id': 'ur.012246446667.91', 'affiliations': [{'id': 'grid.5596.f', 'name': 'KU Leuven', 'org_types': ['Education'], 'city': 'Leuven', 'city_id': 2792482, 'country': 'Belgium', 'country_code': 'BE', 'state': None, 'state_code': None}]}, {'first_name': 'B.', 'last_name': 'De Bièvre', 'researcher_id': 'ur.013305075217.11', 'affiliations': [{'id': 'grid.442123.2', 'name': 'University of Cuenca', 'org_types': ['Education'], 'city': 'Cuenca', 'city_id': 3658666, 'country': 'Ecuador', 'country_code': 'EC', 'state': None, 'state_code': None}]}, {'first_name': 'J.', 'last_name': 'Deckers', 'researcher_id': 'ur.0761456127.40', 'affiliations': [{'id': 'grid.5596.f', 'name': 'KU Leuven', 'org_types': ['Education'], 'city': 'Leuven', 'city_id': 2792482, 'country': 'Belgium', 'country_code': 'BE', 'state': None, 'state_code': None}]}]], 'FOR': [{'id': '2539', 'name': '0406 Physical Geography and Environmental Geoscience'}], 'issn': ['0885-6087', '1099-1085'], 'journal': {'id': 'jour.1043737', 'title': 'Hydrological Processes'}, 'type': 'article', 'research_org_country_names': ['Belgium', 'Ecuador'], 'doi': '10.1002/hyp.5867', 'year': 2005, 'times_cited': 72}], '_stats': {'total_count': 957, 'limit': 1, 'offset': 0}}

1条回答

网友

1楼 · 发布于 2024-06-01 07:45:24

尝试使用nested_to_record，然后转换为pandas数据帧，然后手动更改列：

from pandas.io import json
data = data['publications']   
df = json.nested_to_record(data)
df=pd.DataFrame(df)
df['FOR']=df['FOR'].tolist()[0][0]['name']
df['author_affiliations']=','.join([i[0]['first_name']+i[0]['last_name']+' ('+i[0]['affiliations'][0]['name']+', '+i[0]['affiliations'][0]['country']+';'+i[0]['affiliations'][1]['name']+', '+i[0]['affiliations'][1]['country'] for i in df['author_affiliations'][0]])
df['issn']=','.join(df['issn'][0])
df['research_org_country_names']=','.join(df['research_org_country_names'][0])

现在：

^{pr2}$

是（显示为图像，jupyter笔记本结果，因为对于我的空闲来说太大了）：

注意：json.nested_to_record产生错误，执行json.json_normalize而不是

相关问题更多 >

编程相关推荐

热门问题

热门文章