Python:在一行中使用多个嵌套dict解析JSON字符串

2024-10-01 13:38:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有很多JSON文件要解析,每个文件大小在1-2MB之间。通常,使用JSON.load(JSON_文件)将JSON中的数据作为dict加载不会有问题。但是,在本例中,JSON是多个嵌套字典的字符串,都在一行中

字典不会像列表中那样以“,”分隔。我只是每个文件有一个很长的嵌套字典字符串。例如,在下面的代码片段中,我有两个嵌套字典,每个字典的外部级别都有一个键(“第一个和第二个字典分别为GGGGHH”和“ggghh”)

{"GGGGHH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["229.0934"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["293.1353"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "a4": {"spectrum_89": ["202.1087"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}{"GGGHGH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["309.1312"], "spectrum_107": ["309.1314"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["213.0985"], "spectrum_107": ["213.0985"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}

我见过解析多个JSON对象的示例,但仅当它们位于一个数组中时

有人能帮忙吗?我无法控制JSON文件的格式,因此无法以更简单的格式重新生成数据。如果这个问题以前有人回答过,我表示歉意——我看不到任何适用于这个特定案例的答案


Tags: 文件数据字符串json字典b2spectrumb3
2条回答

您的字符串是无效的json,但看起来它只是一组有效的json字典,没有逗号,而是背靠背地连接在一起

只需在字典之间添加逗号,用"}, {"替换任何出现的"}{",将其插入"[""]"之间,使其成为字典列表的有效json,就可以了json.loads

s = '{"GGGGHH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["229.0934"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["293.1353"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "a4": {"spectrum_89": ["202.1087"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}{"GGGHGH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["309.1312"], "spectrum_107": ["309.1314"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["213.0985"], "spectrum_107": ["213.0985"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}'
json.loads("[" + s.replace("}{", "}, {") + "]")

输出:

[{'GGGGHH': {'b2': {'spectrum_89': ['115.0502']},
   'b3': {'spectrum_89': ['172.0716'], 'spectrum_107': ['172.0717']},
   'b4': {'spectrum_89': ['229.0934']},
   'b5': {'spectrum_89': ['366.1527'], 'spectrum_107': ['366.1537']},
   'y1': {'spectrum_89': ['156.0769'], 'spectrum_107': ['156.0769']},
   'y2': {'spectrum_89': ['293.1353']},
   'y3': {'spectrum_89': ['372.1407'], 'spectrum_107': ['350.1563']},
   'a4': {'spectrum_89': ['202.1087']},
   'ImH': {'spectrum_89': ['110.0715'], 'spectrum_107': ['110.0715']}}},
 {'GGGHGH': {'b2': {'spectrum_89': ['115.0502']},
   'b3': {'spectrum_89': ['172.0716'], 'spectrum_107': ['172.0717']},
   'b4': {'spectrum_89': ['309.1312'], 'spectrum_107': ['309.1314']},
   'b5': {'spectrum_89': ['366.1527'], 'spectrum_107': ['366.1537']},
   'y1': {'spectrum_89': ['156.0769'], 'spectrum_107': ['156.0769']},
   'y2': {'spectrum_89': ['213.0985'], 'spectrum_107': ['213.0985']},
   'y3': {'spectrum_89': ['372.1407'], 'spectrum_107': ['350.1563']},
   'ImH': {'spectrum_89': ['110.0715'], 'spectrum_107': ['110.0715']}}}]

对于更一般的情况(例如,如果两个字典之间可能存在空格,请使用正则表达式替换)

json.loads("[" + re.sub(r"\}\s*\{", "}, {", s) + "]")

其中正则表达式"\}\s*\{"}匹配,后跟0个或多个空格字符,后跟{

这看起来非常像畸形的ndjson。 您可以用}\n{替换}{,然后使用ndjson

import ndjson
with open('spam.json') as f:
    source = f.read()
    source = source.replace('}{', '}\n{')
    data = ndjson.loads(source)

print(data)

相关问题 更多 >