将json行数据划分为多列并存储为数据帧

2024-09-29 04:24:41 发布

您现在位置:Python中文网/ 问答频道 /正文

将数据从JSON读取到pandas时,将读取一个多条件值列,如下所示。 使用基础数据创建时间序列图,其中包含您认为合适/有价值的任何维度

首先,我使用此代码从示例URL导入数据,因此给定示例输入数据:

Code 

    import requests
    import json
    import pandas as pd
    from urllib.request import urlopen
    from pandas.io.json import json_normalize
    response = requests.get('http://test1.com').json

Input:
JSON Data:
{
  "odata.metadata":"http://test1.com,"value":[
    {
      "Data":"28.00000000","Date_Code":"20200401","Date_ItemName_ENG":"1 Apr 2020","Date_SortOrder":"10","Date_ItemNotes_ENG":"","LocalHealthBoard_Code":"7A1","LocalHealthBoard_ItemName_ENG":"Betsi Cadwaladr University Local Health Board","LocalHealthBoard_SortOrder":"2","LocalHealthBoard_Hierarchy":"W92000004","LocalHealthBoard_ItemNotes_ENG":"","LocalHealthBoard_AltCode1":"W11000023","Hospitaltype_Code":"NHS","Hospitaltype_ItemName_ENG":"All NHS hospitals","Hospitaltype_SortOrder":"1","Hospitaltype_Hierarchy":"AllHosp","Hospitaltype_ItemNotes_ENG":"Includes data from acute hospitals only until 19 April 2020. Field hospitals data were added from 20 April 2020, community hospitals data were added from 23 April 2020, and mental health hospitals data were added from 10 July 2020.","Indicator_Code":"CO_Admissions_C19","Indicator_ItemName_ENG":"COVID-19 admissions (suspected and confirmed)","Indicator_SortOrder":"102","Indicator_Hierarchy":"Misc_Admissions_All","Indicator_ItemNotes_ENG":"Patients admitted as suspected or confirmed with COVID-19","RowKey":"0000000000000000","PartitionKey":""
    },{
      "Data":"28.00000000","Date_Code":"20200401","Date_ItemName_ENG":"1 Apr 2020","Date_SortOrder":"10","Date_ItemNotes_ENG":"","LocalHealthBoard_Code":"7A1","LocalHealthBoard_ItemName_ENG":"Betsi Cadwaladr University Local Health Board","LocalHealthBoard_SortOrder":"2","LocalHealthBoard_Hierarchy":"W92000004","LocalHealthBoard_ItemNotes_ENG":"","LocalHealthBoard_AltCode1":"W11000023","Hospitaltype_Code":"TotAcute","Hospitaltype_ItemName_ENG":"All Acute hospitals","Hospitaltype_SortOrder":"2","Hospitaltype_Hierarchy":"NHS","Hospitaltype_ItemNotes_ENG":"Prior to 10 July 2020, data may include small numbers of mental health unit beds.","Indicator_Code":"CO_Admissions_C19","Indicator_ItemName_ENG":"COVID-19 admissions (suspected and confirmed)","Indicator_SortOrder":"102","Indicator_Hierarchy":"Misc_Admissions_All","Indicator_ItemNotes_ENG":"Patients admitted as suspected or confirmed with COVID-19","RowKey":"0000000000000001","PartitionKey":""
    }


Needed output:
Data     Date_Code      Date_ItemName_ENG      Date_SortOrder    ................................

Solution Tried::::::
#####Tried Method 1#####################

    split = response['value'].values.tolist()
    rate = pd.DataFrame(split,columns =['Data', 'Date_Code','Date_ItemName_ENG','Date_SortOrder'])

Error: TypeError: 'method' object is not subscriptable

#####Tried Method 2#########################

    data = json.loads(response)
    
    #print(json.dumps(data,indent=2))
    
    final_data = []
    for item in data['value']:
        my_dict = {}
        my_dict['Data'] = item['Data']
        my_dict['Date_Code'] = item['Date_Code']
        my_dict['Date_ItemName_ENG'] = item['Date_ItemName_ENG']
        my_dict['Date_SortOrder'] = item['Date_SortOrder']   
        print(my_dict)
        final_data.append(my_dict)
    back_json=json.dumps(final_data)

Note : Getting output in dict format not in different columns 
{'Data': '285.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '4.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '4.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '14.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '14.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '7.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}
{'Data': '7.00000000', 'Date_Code': '20200408', 'Date_ItemName_ENG': '8 Apr 2020', 'Date_SortOrder': '17'}

######Method 3#################################

    back_json=json.dumps(final_data)
    #df2 = pd.json_normalize(back_json)
    parsed_days = json_normalize(back_json)
    json_struct = json.loads(response.to_json(orient="records"))    
    df_flat = pd.io.json.json_normalize(json_struct)

Error: AttributeError: 'str' object has no attribute 'values'

Please help tried all solutions but no way out

Tags: jsondatadatemycodeengdictapr
1条回答
网友
1楼 · 发布于 2024-09-29 04:24:41

如果我理解正确,您可以使用最终数据直接获得所需的输出 使用json.dumps没有用,因为您已经有了一个包含一系列字典的列表

final_data = []
for item in data['value']:
    my_dict = {}
    my_dict['Data'] = item['Data']
    my_dict['Date_Code'] = item['Date_Code']
    my_dict['Date_ItemName_ENG'] = item['Date_ItemName_ENG']
    my_dict['Date_SortOrder'] = item['Date_SortOrder']   
    print(my_dict)
    final_data.append(my_dict)

df= pd.DataFrame(final_data)

相关问题 更多 >