如何在Python中从数据帧创建嵌套的JSON

2024-05-19 00:40:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含Windows10日志的数据框。我想把这个转换成JSON。什么是有效的方法?你知道吗

我已经生成了一个默认的df,但是这不是嵌套的。我多么想要它

{
    "0": {
        "ProcessName": "Firefox",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "1": {
        "ProcessName": "Excel",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "2": {
        "ProcessName": "Word",
        "time": "2019-07-12T01:30:00",
        "timeFloat": 1562888000.0,
        "internal_time": 1.5533333333,
        "counter": 0
}

我希望它看起来像这样

{
    "0": {
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "Processes" : {
                     "Firefox" : 0 # ("counter" value),
                     "Excel" : 0 
    },
    "1": ...
}

Tags: 数据方法jsondftimevaluecounterfirefox
2条回答

据我所知,您需要按“时间”对对象进行分组,并合并来自不同进程的计数器。如果是-以下是实施示例:

input_data = {
    "0": {
        "ProcessName": "Firefox",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "2": {
        "ProcessName": "ZXC",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "3": {
        "ProcessName": "QWE",
        "time": "else_time",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    }
}


def group_input_data_by_time(dict_data):
    time_data = {}
    for value_dict in dict_data.values():
        counter = value_dict["counter"]
        process_name = value_dict["ProcessName"]
        time_ = value_dict["time"]
        common_data = {
            "time": time_,
            "timeFloat": value_dict["timeFloat"],
            "internal_time": value_dict["internal_time"],
        }
        common_data = time_data.setdefault(time_, common_data)
        processes = common_data.setdefault("Processes", {})
        processes[process_name] = counter

    # if required to change keys from time to enumerated
    result_dict = {}
    for ind, value in enumerate(time_data.values()):
        result_dict[str(ind)] = value

    return result_dict


print(group_input_data_by_time(input_data))

结果是:

{
    "0": {
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "Processes": {
            "Firefox": 0,
            "ZXC": 0
        }
    },
    "1": {
        "time": "else_time",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "Processes": {
            "QWE": 0
        }
    }
}

在我看来,您似乎想要从基于['time', 'timeFloat', 'internal_time']的聚合数据创建JSON,您可以这样做:

pd.groupby(['time', 'timeFloat', 'internal_time'])

但是,您的示例建议您要维护索引键("0", "1",等等),这与前面所述的意图相反。你知道吗

一个时间点的聚合值:

"Firefox" : 0
"Excel" : 0 

似乎与这些索引键相对应,在进行聚合时这些索引键将丢失。你知道吗

但是,如果您决定使用聚合,代码将如下所示:

# reading in data:

import pandas as pd
import json
json_data = {
    "0": {
        "ProcessName": "Firefox",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "1": {
        "ProcessName": "Excel",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "2": {
        "ProcessName": "Word",
        "time": "2019-07-12T01:30:00",
        "timeFloat": 1562888000.0,
        "internal_time": 1.5533333333,
        "counter": 0
}}

df = pd.DataFrame.from_dict(json_data)
df = df.T
df.set_index(["ProcessName", 'time', 'timeFloat', 'internal_time', 'counter'])

# processing:
ddf = df.groupby(['time', 'timeFloat', 'internal_time'], as_index=False).agg(lambda x: list(x))
ddf['Processes'] = ddf.apply(lambda r: dict(zip(r['ProcessName'], r['counter'])), axis=1)
ddf = ddf.drop(['ProcessName', 'counter'], axis=1).

# printing the result:
json2 = json.loads(ddf.to_json(orient="records"))
print(json.dumps(json2, indent=4, sort_keys=True))

结果:

[
    {
        "Processes": {
            "Excel": 0,
            "Firefox": 0
        },
        "internal_time": 0.0,
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0
    },
    {
        "Processes": {
            "Word": 0
        },
        "internal_time": 1.5533333333,
        "time": "2019-07-12T01:30:00",
        "timeFloat": 1562888000.0
    }
]

相关问题 更多 >

    热门问题