我有一套嵌套的JSON,到目前为止,我正在做以下工作:
r = session.get(search_url, auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL), verify=False)
json_data = json.loads(r.content)
flattened_data = json_normalize(json_data['documents'])
print(list(flattened_data))
这将输出以下结果:
['affected_users', 'aggregatedLabels', 'aliases', 'assignedFolder', 'assigneeIdentity', 'attachments', 'authorizations', 'autoUpgrade.workingHours', 'conversation', 'createDate', 'dedupes', 'deleted', 'description', 'descriptionContentType', 'editCount', 'engagementList', 'extensions.backlog.priority', 'extensions.effort.effortEstimatedLocal.effort', 'extensions.effort.effortEstimatedLocal.unit', 'extensions.effort.effortEstimatedRecursiveSum.effort', 'extensions.effort.effortEstimatedRecursiveSum.unit', 'extensions.effort.effortRemainingLocalSum.effort', 'extensions.effort.effortRemainingLocalSum.unit', 'extensions.effort.effortRemainingRecursiveSum.effort', 'extensions.effort.effortRemainingRecursiveSum.unit', 'extensions.effort.effortSpentLocalSum.effort', 'extensions.effort.effortSpentLocalSum.unit', 'extensions.effort.effortSpentRecursiveSum.effort', 'extensions.effort.effortSpentRecursiveSum.unit', 'extensions.tt.assignedGroup', 'extensions.tt.building', 'extensions.tt.caseType', 'extensions.tt.category', 'extensions.tt.city', 'extensions.tt.endCode', 'extensions.tt.ecd', 'extensions.tt.impact', 'extensions.tt.item', 'extensions.tt.justification', 'extensions.tt.migrationStatus', 'extensions.tt.minImpact', 'extensions.tt.resolution', 'extensions.tt.rootCause', 'extensions.tt.rootCauseDetails', 'extensions.tt.status', 'extensions.tt.type', 'frames', 'id', 'identityTimestamped', 'inheritedLabels', 'isTicket', 'labels', 'lastAssignedDate', 'lastResolvedByIdentity', 'lastResolvedDate', 'lastUpdatedActualDate', 'lastUpdatedConversationDate', 'lastUpdatedDate', 'lastUpdatedIdentity', 'next_step.action', 'next_step.exceptions', 'next_step.owner', 'parentTasks', 'requesterIdentity', 'rootCauses', 'rulesReceipt', 'schedule.estimatedCompletionDate', 'schedule.estimatedStartDate', 'schedule.needByDate', 'schema', 'slaReceipts', 'status', 'stickyThreadId', 'submitterIdentity', 'subtasks', 'tags', 'threads', 'title', 'watchers']
从这个列表中,我只尝试将某些键及其值放入数据帧中:
print(flattened_data['assigneeIdentity',
# 'createDate',
# 'description',
# 'extensions.tt.assignedGroup',
# 'extensions.tt.category',
# 'extensions.tt.endCode',
# 'extensions.tt.ecd',
# 'extensions.tt.impact',
# 'extensions.tt.item',
# 'extensions.tt.justification',
# 'extensions.tt.resolution',
# 'extensions.tt.rootCause',
# 'extensions.tt.rootCauseDetails',
# 'extensions.tt.status',
# 'extensions.tt.type',
# 'id',
# 'labels',
# 'lastAssignedDate',
# 'lastResolvedByIdentity',
# 'lastResolvedDate',
# 'lastUpdatedActualDate',
# 'lastUpdatedConversationDate',
# 'lastUpdatedDate',
# 'lastUpdatedIdentity',
# 'requesterIdentity',
# 'submitterIdentity',
# 'title',
# 'watchers'])
当我这样做时,我得到一个关键错误。因此,对于上面列出的字段和每个字段的嵌套级别,基本JSON如下所示;每个“item”在documents元素下是一个整数,我需要更多的嵌套元素:
documents:
0:
extensions:
tt:
category:
type:
item:
assignedGroup:
impact:
justification:
endCode:
rootCause:
rootCauseDetails:
status:
id:
title:
lastAssignedDate:
createDate:
lastUpdatedActualDate:
lastResolvedDate:
lastResolvedByIdentity:
lastUpdatedIdentity:
assigneeIdentity:
submitterIdentity:
requesterIdentity:
identityTimestamped:
lastUpdatedConversationDate:
lastUpdatedDate:
1:
extensions:
tt:
category:
type:
item:
assignedGroup:
impact:
justification:
endCode:
rootCause:
rootCauseDetails:
status:
id:
title:
lastAssignedDate:
createDate:
lastUpdatedActualDate:
lastResolvedDate:
lastResolvedByIdentity:
lastUpdatedIdentity:
assigneeIdentity:
submitterIdentity:
requesterIdentity:
identityTimestamped:
lastUpdatedConversationDate:
lastUpdatedDate:
我该如何把这个和值放到一个数据帧中。你知道吗
flattened_data
应该已经是有效的数据帧。错误似乎是您试图打印flattened_data["key1", "key2", ...]
,它将在flattened_data
中查找名为["key1", "key2", ...]
的列。本质上,您是在告诉DataFrame“获取其名称是此列表的列”。你知道吗要从数据帧中获取列列表,您应该尝试
flattened_data[["key1", "key2", ...]]
,这表示“获取名称位于该列表中的所有列”。你知道吗这里还可能发生的情况是,您有一个列为
["0.id", "0.title", ..., "1.id", "1.title", ...]
的数据帧,只有一行:分配给JSON对象中每个路径的值。你知道吗但是,
pandas.io.json.normalize_json()
可以将字典列表作为参数,因此使用json_data['documents']
(例如,json_data['documents'].values()
)中的子字典列表应该返回正确的数据帧,而不是使用flattened_data = json_normalize(json_data['documents'])
。你知道吗然后,您可以检索所需的列:
引用了我今天刚刚评论的fantastic response的一些东西。也许这会有帮助:
相关问题 更多 >
编程相关推荐