无法用多个值展平Json文件

2024-05-19 12:03:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个json文件,我正试图扁平化。如果json文件中只有一条消息,则该函数可以正常工作,但是如果有多条消息,则会出现以下错误:

    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 39 column 1 (char 952)

JSON文件示例

{
    "number": "Abc",
    "date": "01.10.2016",
    "name": "R 3932",
    "locations": [
        {
            "depTimeDiffMin": "0",
            "name": "Spital am Pyhrn Bahnhof",
            "arrTime": "",
            "depTime": "06:32",
            "platform": "2",
            "stationIdx": "0",
            "arrTimeDiffMin": "",
            "track": "R 3932"
        },
        {
            "depTimeDiffMin": "0",
            "name": "Windischgarsten Bahnhof",
            "arrTime": "06:37",
            "depTime": "06:40",
            "platform": "2",
            "stationIdx": "1",
            "arrTimeDiffMin": "1",
            "track": ""
        },
        {
            "depTimeDiffMin": "",
            "name": "Linz/Donau Hbf",
            "arrTime": "08:24",
            "depTime": "",
            "platform": "1A-B",
            "stationIdx": "22",
            "arrTimeDiffMin": "1",
            "track": ""
        }
    ]
}

{
    "number": "Xyz",
    "date": "01.10.2016",
    "name": "R 3932",
    "locations": [
        {
            "depTimeDiffMin": "0",
            "name": "Spital am Pyhrn Bahnhof",
            "arrTime": "",
            "depTime": "06:32",
            "platform": "2",
            "stationIdx": "0",
            "arrTimeDiffMin": "",
            "track": "R 3932"
        },
        {
            "depTimeDiffMin": "0",
            "name": "Windischgarsten Bahnhof",
            "arrTime": "06:37",
            "depTime": "06:40",
            "platform": "2",
            "stationIdx": "1",
            "arrTimeDiffMin": "1",
            "track": ""
        },
        {
            "depTimeDiffMin": "",
            "name": "Linz/Donau Hbf",
            "arrTime": "08:24",
            "depTime": "",
            "platform": "1A-B",
            "stationIdx": "22",
            "arrTimeDiffMin": "1",
            "track": ""
        }
    ]
}

我的代码:

import json
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize


desired_width=500
pd.set_option('display.width', desired_width)
np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns', 100)

with open('C:/Users/username/Desktop/samplejson.json') as f:
    data = json.load(f)


def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out

for data in data:
    flat = flatten_json(data)
    new_flat = json_normalize(flat)

dfs = pd.DataFrame(new_flat)
print(dfs.head(2))

我正在尝试解析整个JSON文件,并将所有数据加载到dataframe中,以便开始使用它进行分析。如果文件中只有一条消息,那么代码就可以正常工作,并输出一个包含许多列的非常宽的表。你知道吗

如果我在JSON文件中有多条消息,我会得到上面附加的错误。我在stackoverflow中查看了许多解决方案,但它们似乎不是

有没有更简单的方法来读取和展平JSON文件。我试着使用熊猫的json\u规范化,但它只会使级别1变平。你知道吗


Tags: 文件namejson消息datatrackplatformflatten
2条回答

你可以这样做。假设j是完整的json对象。你知道吗

def parse(j):
    for item in j:
        data = pd.DataFrame([{k:v for k, v in item.items() if k != 'locations'}])
        locs = pd.DataFrame(item.get('locations'))
        yield pd.concat([data, locs], axis=1).fillna(method='ffill')

pd.concat(parse(j), axis=0, ignore_index=True)

         date    name number arrTime   ...                       name platform stationIdx   track
0  01.10.2016  R 3932    Abc           ...    Spital am Pyhrn Bahnhof        2          0  R 3932
1  01.10.2016  R 3932    Abc   06:37   ...    Windischgarsten Bahnhof        2          1        
2  01.10.2016  R 3932    Abc   08:24   ...             Linz/Donau Hbf     1A-B         22        
3  01.10.2016  R 3932    Xyz           ...    Spital am Pyhrn Bahnhof        2          0  R 3932
4  01.10.2016  R 3932    Xyz   06:37   ...    Windischgarsten Bahnhof        2          1        
5  01.10.2016  R 3932    Xyz   08:24   ...             Linz/Donau Hbf     1A-B         22 

但是,您的JSON无效,因为您缺少一个,来分隔这两个对象。你知道吗

如果文件中只有一条消息,则该文件是一个有效的json,但是如果您有更多的消息(当您放置它们时),json将不再有效([JSON]: Introducing JSON)。示例:

>>> json.loads("{}")
{}
>>> json.loads("{} {}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Install\x64\Python\Python\03.06.08\Lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "c:\Install\x64\Python\Python\03.06.08\Lib\json\decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 4 (char 3)
>>> json.loads("[{}, {}]")
[{}, {}]

有关详细信息,请查看[Python 3]: json - JSON encoder and decoder

使有效的json包含多个消息的最简单方法:

  • 它们都应该用方括号括起来(“[”,“]”)
  • 每个连续的2应该用逗号(“”分隔)

就像“位置”子消息的情况一样。你知道吗

相关问题 更多 >

    热门问题