如何将json与Pandas连接

2024-10-01 19:24:19 发布

您现在位置:Python中文网/ 问答频道 /正文

这里的目标是统计每种诊断类型的患者数量。在病历中,就诊id是唯一的,但在诊断记录中,由于一次就诊可能有多个诊断,因此同一就诊id可能有多个诊断id

为此,我认为需要将2数据框与实地访问id相链接。请任何人说明如何通过Pandas链接2 json,并计算每个诊断的患者数量。非常感谢

患者记录

enter image description here

JSON[病历]

[
 {
   "Doctor id":"AU1254",
   "Patient":[
      {
         "Patient id":"BK1221",
         "Patient name":"Tim"
      }
   ],  
   "Visit id":"B0001"       
},
 {
   "Doctor id":"AU8766",
   "Patient":[
      {
         "Patient id":"BK1209",
         "Patient name":"Sue"
      }
   ],  
   "Visit id":"B0002"  
},
 {
   "Doctor id":"AU1254",
   "Patient":[
      {
         "Patient id":"BK1323",
         "Patient name":"Sary"
      }
   ],  
   "Visit id":"B0003"  
  }
]

诊断记录

enter image description here

JSON[诊断记录]

[
   {
      "Visit id":"B0001",
      "Diagnosis":[
         {
            "diagnosis id":"D1001",
            "diagnosis name":"fever"           
         },
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"
         },
         {
             "diagnosis id":"D1265",
            "diagnosis name":"running nose"
         }
      ]
   }, 
      {
      "Visit id":"B0002",
      "Diagnosis":[
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"           
         },
         {
            "diagnosis id":"D1453",
            "diagnosis name":"stomach ache"
         }
      ]
   } 
]

Tags: name患者idjson数量链接记录visit
3条回答

请尝试以下方法查看患者记录

patients_df = pd.read_json(patients.json)

patient_id = []
patient_name =[]

# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
    patient_id = patient[0]['Patient id']
    patient_name = patient[0]['Patient name']

# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id

# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')

试试看:(x>;JSON [Patient record]y>;JSON [Diagnosis record]

df = pd.DataFrame(x)
df = pd.concat([df.pop('Patient').apply(lambda x: pd.Series(x[0])), df], axis=1)

df1 = pd.DataFrame(y)

df1 = pd.concat([df1.explode('Diagnosis')['Diagnosis'].apply(pd.Series), df1], axis=1)
df1.pop('Diagnosis')

df_merge = pd.merge(df,df1, on='Visit id', how='right')

df\u合并:

    Patient id  Patient name  Doctor id Visit id  diagnosis id diagnosis name
0   BK1221      Tim           AU1254    B0001     D1001        fever
1   BK1221      Tim           AU1254    B0001     D1987        cough
2   BK1221      Tim           AU1254    B0001     D1265        running nose
3   BK1209      Sue           AU8766    B0002     D1987        cough
4   BK1209      Sue           AU8766    B0002     D1453        stomach ache

计数:

df_merge.groupby('diagnosis name')['Patient id'].count()

编辑:

尝试:

df_merge.groupby('diagnosis name').agg({'Patient name': [list, 'count']}).reset_index()

diagnosis name  Patient name
                list        count
        cough   [Tim, Sue]  2
        fever   [Tim]       1
running nose    [Tim]       1
stomach ache    [Sue]       1

您可以在visit id上使用左merge()merge

>  from pandas.io.json import json_normalize
>  import json
>  json1 = <your first json here>
>  json2 = <your second json here>
>  df1=pd.json_normalize(data=json.loads(json1), record_path='Patient', meta=['Doctor id','Visit id'])
>  df2=pd.json_normalize(data=json.loads(json2), record_path='Diagnosis', meta=['Visit id'])


>  print(df1.merge(df2, on='Visit id', how='left').dropna())
  Patient id Patient name Doctor id Visit id diagnosis id diagnosis name
0     BK1221          Tim    AU1254    B0001        D1001          fever
1     BK1221          Tim    AU1254    B0001        D1987          cough
2     BK1221          Tim    AU1254    B0001        D1265   running nose
3     BK1209          Sue    AU8766    B0002        D1987          cough
4     BK1209          Sue    AU8766    B0002        D1453   stomach ache

您还可以进行一些奇特的分组/打印

pd.pivot_table(df3, index=['Patient id','Patient name','Doctor id','Visit id'], values=['diagnosis id','diagnosis name'], aggfunc=list)
                                                     diagnosis id                diagnosis name
Patient id Patient name Doctor id Visit id
BK1209     Sue          AU8766    B0002            [D1987, D1453]         [cough, stomach ache]
BK1221     Tim          AU1254    B0001     [D1001, D1987, D1265]  [fever, cough, running nose]

和每个诊断/每个患者的计数

df3.groupby(['diagnosis id', 'diagnosis name']).agg({'Patient name': [list, 'count']})
                            Patient name
                                    list count
diagnosis id diagnosis name
D1001        fever                 [Tim]     1
D1265        running nose          [Tim]     1
D1453        stomach ache          [Sue]     1
D1987        cough            [Tim, Sue]     2

相关问题 更多 >

    热门问题