如何将文本转换为json文件?

2024-09-30 18:15:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要用这种结构创建一个JSON文件

[{"image_id": 0873, "caption": "clock tower with a clock on top of it"}, {"image_id": 1083, "caption": "two zebras are standing in the grass in the grass"} , .....

从包含

image_id 0873  caption clock tower with a clock on top of it 
image_id 1083  caption two zebras are standing in the grass in the grass 
image_id 1270  caption baseball player is swinging a bat at the ball  
image_id 1436  caption man is sitting on the bed with laptop 

我怎样才能开始这样做


Tags: oftheinimageidontopwith
3条回答

尝试使用regexp-easy导入更复杂的模式。以下是@Kozubi answer的扩展版本:

    import json
    import re
    
    json_data = []
    with open("test.txt") as f:
        pattern = re.compile(r"""image_id\s+(?P<image_id>[0-9]+)\s+
                                 caption\s+(?P<caption>.*)$
                                 """, re.X)
        for line in f.readlines():
            m = pattern.match(line.strip())
            if m:
                json_data.append({
                    "image_id": int(m.group('image_id')),
                    "caption": m.group('caption')
                    })
                
        print(json.dumps(json_data, indent=4))            
        json.dump(json_data, open("json_dump.json", 'w'), indent=4)

假设每条线看起来像: 图像{image{u id}标题{caption} 您可以使用str方法split(maxsplit=number)将行拆分为四个部分

line = "image_id 0873  caption clock tower with a clock on top of it"
_, image_id, _, caption = line.split(maxsplit=3)
# Now image_id = "0873", caption = "caption clock tower with a clock on top of it"

对于迭代文件的所有行:

images = []
with open(path) as f:
    for line in f:
        _, image_id, _, caption = line.split(maxsplit=3)
        images.append({"image_id": int(image_id), "caption": caption})

要将变量保存到JSON文件中,可以使用JSON模块:

import json
with open(path_to_save, "w") as f:
    json.dump(images, f)

这应该是诀窍:

import json

# get your data
file_lines = open("file_with_data.txt").readlines()
json_data = []
for line in file_lines:
    # removing new line char \n
    line = line.replace("\n", "")
    # split words inside line
    splt_line = line.split(" ")
    # bullit single dict from line data
    small_json = {splt_line[0]: splt_line[1], splt_line[3]: " ".join(splt_line[4:]).strip()}
    # add data to your list
    json_data.append(small_json)
# now dump List[Dict] to  .json file
json.dump( json_data, open("json_dump.json", 'w'),)

相关问题 更多 >