使用Python合并文件夹中的多个JSONL文件

2024-06-12 00:33:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找一种解决方案,使用Python脚本从一个文件夹合并多个JSONL文件。类似于下面用于JSON文件的脚本

import json
import glob

result = []
for f in glob.glob("*.json"):
    with jsonlines.open(f) as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

请在下面找到我的JSONL文件示例(仅一行):

{"date":"2021-01-02T08:40:11.378000000Z","partitionId":"0","sequenceNumber":"4636458","offset":"1327163410568","iotHubDate":"2021-01-02T08:40:11.258000000Z","iotDeviceId":"text","iotMsg":{"header":{"deviceTokenJwt":"text","msgType":"text","msgOffset":3848,"msgKey":"text","msgCreation":"2021-01-02T09:40:03.961+01:00","appName":"text","appVersion":"text","customerType":"text","customerGroup":"Customer"},"msgData":{"serialNumber":"text","machineComponentTypeId":"text","applicationVersion":"3.1.4","bootloaderVersion":"text","firstConnectionDate":"2018-02-20T10:34:47+01:00","lastConnectionDate":"2020-12-31T12:05:04.113+01:00","counters":[{"type":"DurationCounter","id":"text","value":"text"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":2423},{"type":"IntegerCounter","id":"text","value":9914},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":976},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"IntegerCounter","id":"text","value":28},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":1}],"defects":[{"description":"ProtocolDb.ProtocolIdNotFound","defectLevelId":"Warning","occurrence":3},{"description":"BridgeBus.CrcError","defectLevelId":"Warning","occurrence":1},{"description":"BridgeBus.Disconnected","defectLevelId":"Warning","occurrence":6}],"maintenanceEvents":[{"interventionId":"Other","comment":"text","appearance_display":0,"intervention_date":"2018-11-29T09:52:16.726+01:00","intervention_counterValue":"text","intervention_workerName":"text"},{"interventionId":"Other","comment":"text","appearance_display":0,"intervention_date":"2019-06-04T15:30:15.954+02:00","intervention_counterValue":"text","intervention_workerName":"text"}]}}}

有人知道我该怎么处理这个吗


Tags: 文件textidjsondatevaluetypedescription
2条回答

可以使用加载的每个json对象更新主dict。像

import json
import glob

result = {}
for f in glob.glob("*.json"):
    with jsonlines.open(f) as infile:
        result.update(json.load(infile)) #merge the dicts

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

但这会使类似的钥匙过度磨损

因为JSONL文件中的每一行都是一个完整的JSON对象,所以实际上根本不需要解析JSONL文件来将它们合并到另一个JSONL文件中。相反,通过简单地连接它们来合并它们。但是,这里需要注意的是,JSONL格式并不要求在文件末尾使用换行符。因此,您必须将每一行读入缓冲区,以测试JSONL文件是否以换行符结尾,在这种情况下,您必须显式输出换行符,以便分隔下一个文件的第一条记录:

with open("merged_file.json", "w") as outfile:
    for filename in glob.glob("*.json"):
        with open(filename) as infile:
            for line in infile:
                outfile.write(line)
            if not line.endswith('\n'):
                outfile.write('\n')

相关问题 更多 >