Pandas没有正确地从JSON API获取数据

2024-09-30 18:23:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将数据从JSON API获取到数据帧。但是,熊猫没有正确读取数据。下面是我的代码和输出:

import pandas as pd
import requests
r = requests.get('https://api.covid19india.org/raw_data5.json')
j = r.json()
df = pd.DataFrame.from_dict(j)

然而,我得到的输出是不正确的

raw_data
0   {'agebracket': '', 'contractedfromwhichpatient...
1   {'agebracket': '', 'contractedfromwhichpatient...
2   {'agebracket': '', 'contractedfromwhichpatient...
3   {'agebracket': '', 'contractedfromwhichpatient...
4   {'agebracket': '', 'contractedfromwhichpatient...

当我运行df.info()时,我得到:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 1 columns):
raw_data    20409 non-null object
dtypes: object(1)
memory usage: 159.5+ KB

有谁能帮我解决这个问题吗


Tags: columns数据importjsondataframepandasdfdata
2条回答

使用,j = r.json()['raw_data']从json中选择原始数据键

df.info()

输出:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 20 columns):
 #   Column                               Non-Null Count  Dtype 
 -                                               - 
 0   agebracket                           20409 non-null  object
 1   contractedfromwhichpatientsuspected  20409 non-null  object
 2   currentstatus                        20409 non-null  object
 3   dateannounced                        20409 non-null  object
 4   detectedcity                         20409 non-null  object
 5   detecteddistrict                     20409 non-null  object
 6   detectedstate                        20409 non-null  object
 7   entryid                              20409 non-null  object
 8   gender                               20409 non-null  object
 9   nationality                          20409 non-null  object
 10  notes                                20409 non-null  object
 11  numcases                             20409 non-null  object
 12  patientnumber                        20409 non-null  object
 13  source1                              20409 non-null  object
 14  source2                              20409 non-null  object
 15  source3                              20409 non-null  object
 16  statecode                            20409 non-null  object
 17  statepatientnumber                   20409 non-null  object
 18  statuschangedate                     20409 non-null  object
 19  typeoftransmission                   20409 non-null  object
dtypes: object(20)
memory usage: 3.1+ MB

请尝试:

df = df['raw_data'].apply(pd.Series)
df.info()

输出

 <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 20409 entries, 0 to 20408
    Data columns (total 20 columns):
    agebracket                             20409 non-null object
    contractedfromwhichpatientsuspected    20409 non-null object
    currentstatus                          20409 non-null object
    dateannounced                          20409 non-null object
    detectedcity                           20409 non-null object
    detecteddistrict                       20409 non-null object
    detectedstate                          20409 non-null object
    entryid                                20409 non-null object
    gender                                 20409 non-null object
    nationality                            20409 non-null object
    notes                                  20409 non-null object
    numcases                               20409 non-null object
    patientnumber                          20409 non-null object
    source1                                20409 non-null object
    source2                                20409 non-null object
    source3                                20409 non-null object
    statecode                              20409 non-null object
    statepatientnumber                     20409 non-null object
    statuschangedate                       20409 non-null object
    typeoftransmission                     20409 non-null object
    dtypes: object(20)
    memory usage: 3.1+ MB

相关问题 更多 >