当试图提取某些属性时,json_normalize会产生一个KeyError

2024-09-27 09:30:29 发布

您现在位置:Python中文网/ 问答频道 /正文

以下是我的json文件的子集:

d = {'data': {'questions': [{'id': 6574,
                             'text': 'Question #1',
                             'instructionalText': '',
                             'minimumResponses': 0,
                             'maximumResponses': None,
                             'sortOrder': 1,
                             'answers': [{'id': 362949, 'text': 'Answer #1', 'parentId': None},
                                         {'id': 362950, 'text': 'Answer #2', 'parentId': None},
                                         {'id': 362951, 'text': 'Answer #3', 'parentId': None},
                                         {'id': 362952, 'text': 'Answer #4', 'parentId': None}]}]}}

我想把它放在一个数据框中,每个问题都有一行,每个答案都有一行

Python代码:

from pandas import json_normalize
import json

fields = ['text','answers.text']

with open(R'response.json') as f:
    d = json.load(f)

data = json_normalize(d['data'],['questions'],errors='ignore')
data = data[fields]

print(data)

这会产生一个键错误:

KeyError: "['answers.text'] not in index"

我已经在这里呆了几个小时了,但完全无法理解这一点。我觉得应该很简单,但从来都不是


Tags: 文件textanswerimportnoneidjsonfields
2条回答
  • 使用record_prefix,与record_pathmeta一起使用,这样d可以一次全部规范化
    • ^当record_pathmeta之间存在重叠的key名称时,{a1}将导致ValueError,并且'id''text'都在这两个名称中
    • ValueError: Conflicting metadata name id, need distinguishing prefix在不使用record_path的情况下发生
  • 发生KeyError是因为'answers.text'不在d中,它是由.json_normalize()创建的
  • 如果有任何顶层keysdf中不是必需的,请将它们从meta中删除
import pandas as pd

# normalize d
df = pd.json_normalize(data=d['data']['questions'],
                       record_path= ['answers'],
                       meta=['id', 'text', 'instructionalText', 'minimumResponses', 'maximumResponses', 'sortOrder'],
                       record_prefix='answers_')

# display(df)
   answers_id answers_text answers_parentId    id         text     instructionalText minimumResponses maximumResponses sortOrder
0      362949    Answer #1             None  6574  Question #1                                      0             None         1
1      362950    Answer #2             None  6574  Question #1                                      0             None         1
2      362951    Answer #3             None  6574  Question #1                                      0             None         1
3      362952    Answer #4             None  6574  Question #1                                      0             None         1
4      262949    Answer #1             None  4756  Question #2  No cheating, cheater                0             None         1
5      262950    Answer #2             None  4756  Question #2  No cheating, cheater                0             None         1
6      262951    Answer #3             None  4756  Question #2  No cheating, cheater                0             None         1
7      262952    Answer #4             None  4756  Question #2  No cheating, cheater                0             None         1

扩展测试数据

d = {'data': {'questions': [{'id': 6574,
                             'text': 'Question #1',
                             'instructionalText': '',
                             'minimumResponses': 0,
                             'maximumResponses': None,
                             'sortOrder': 1,
                             'answers': [{'id': 362949, 'text': 'Answer #1', 'parentId': None},
                                         {'id': 362950, 'text': 'Answer #2', 'parentId': None},
                                         {'id': 362951, 'text': 'Answer #3', 'parentId': None},
                                         {'id': 362952, 'text': 'Answer #4', 'parentId': None}]},
                            {'id': 4756,
                             'text': 'Question #2',
                             'instructionalText': 'No cheating, cheater',
                             'minimumResponses': 0,
                             'maximumResponses': None,
                             'sortOrder': 1,
                             'answers': [{'id': 262949, 'text': 'Answer #1', 'parentId': None},
                                         {'id': 262950, 'text': 'Answer #2', 'parentId': None},
                                         {'id': 262951, 'text': 'Answer #3', 'parentId': None},
                                         {'id': 262952, 'text': 'Answer #4', 'parentId': None}]}]}}

这是我通常使用的技术

  1. json_normalize()顶级列表
  2. explode()listreset_index()步骤#3
  3. apply(pd.Series)展开子list中的dict
d = {'data': {'questions': [{'id': 6574,
    'text': 'Question #1',
    'instructionalText': '',
    'minimumResponses': 0,
    'maximumResponses': None,
    'sortOrder': 1,
    'answers': [{'id': 362949, 'text': 'Answer #1', 'parentId': None},
     {'id': 362950, 'text': 'Answer #2', 'parentId': None},
     {'id': 362951, 'text': 'Answer #3', 'parentId': None},
     {'id': 362952, 'text': 'Answer #4', 'parentId': None}]}]}}

df = pd.json_normalize(d["data"]["questions"]).explode("answers").reset_index(drop=True)
df = df.join(df["answers"].apply(pd.Series), rsuffix="_ans").drop(columns="answers")

^{tb1}$

相关问题 更多 >

    热门问题