在python中将文本文件转换为json

2024-10-01 13:42:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有多个文档,总共约400GB,我想把它们转换成json格式,以便放到elasticsearch进行分析。在

每个文件大约200 MB。在

原始文件如下所示:

IUGJHHGF@BERLIN:lhfrjy
0t7yfudf@WARSAW:qweokm246
0t7yfudf@CRACOW:Er747474
0t7yfudf@cracow:kui666666
000t7yf@Vienna:1йй2ц2й2цй2цц3у

它的特点不仅仅是英语。key1总是用@分隔,其中city用;或:

在我用代码解析之后:

^{pr2}$

所有文件看起来像:

RRS12345 Cracow Sunflowers
RRD12345 Berin Data

解析之后,我希望得到输出:

  {  
   "location_data":[  
      {  
         "key1":"RRS12345",
         "city":"Cracow",
         "description":"Sunflowers"
      },
      {  
         "key1":"RRD123dsd45",
         "city":"Berlin",
         "description":"Data"
      },
      {  
         "key1":"RRD123dsds45",
         "city":"Berlin",
         "description":"1йй2ц2й2цй2цц3у"
      }
   ]
}

我怎样才能快速地将它转换成所需的json格式,而不是只有英文字符?在


Tags: 文件文档jsoncitydata格式mbdescription
2条回答
import json


def process_text_to_json():
    location_data = []
    with open("file.txt") as f:
        for line in f:
            line = line.split()
            location_data.append({"key1": line[0], "city": line[1], "description": line[2]})

    location_data = {"location_data": location_data}
    return json.dumps(location_data)

输出样本:

{"location_data": [{"city": "Cracow", "key1": "RRS12345", "description": "Sunflowers"}, {"city": "Berin", "key1": "RRD12345", "description": "Data"}, {"city": "Cracow2", "key1": "RRS12346", "description": "Sunflowers"}, {"city": "Berin2", "key1": "RRD12346", "description": "Data"}, {"city": "Cracow3", "key1": "RRS12346", "description": "Sunflowers"}, {"city": "Berin3", "key1": "RRD12346", "description": "Data"}]}

重复每一行,形成你的口述

例如:

d = {"location_data":[]}
with open(filename, "r") as infile:
    for line in infile:
        val = line.split()
        d["location_data"].append({"key1": val[0], "city": val[1], "description": val[2]})

print(d)

相关问题 更多 >