如何将具有JSON值的文本文件转换为CSV

2024-09-26 18:06:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本文件,每行包含json

样本数据:file.text

{"id": "testid1","title": "testtitle1","link": "testlink1","description": "testdes2","entities": ["en1", "en2"]}
{"id": "testid2","title": "testtitle2","link": "testlink2","description": "testdes2","entities": [""]}
{"id": "testid1","title": "testtitle1","link": "testlink1","description": "testdesc","entities": ["en1", "en2", "en3"]}

所需输出:

id  title   link    description entities__001   entities__002   entities__003
testid1 testtitle1  testlink1   testdes2    en1 en2 
testid2 testtitle2  testlink2   testdes2            
testid1 testtitle1  testlink1   testdesc    en1 en2 en3

请建议,我如何在python中执行相同的操作

我尝试使用https://json-csv.com/将我的文件转换为csv联机。不过,它只支持高达1 MB的免费帐户文件和我的文件大小约为200 MB。然而,使用这个链接,我能够用期望的输出成功地转换它


Tags: idjsontitlelinkdescriptionentitiesen1en2
1条回答
网友
1楼 · 发布于 2024-09-26 18:06:01

首先读取文件并处理数据(从字符串转换为json)

import json
with open(r".\data_file.txt") as f:
    data = f.readlines()
processed_data = [json.loads(line) for line in data]

然后在文档上迭代以添加新字段(以展平数据)。有更有效的方法,但这是可行的

import pandas as pd
for document in processed_data:
    for i in range(len(document["entities"])):
        document["entities_{}".format(i+1)] = document["entities"][i]
df = pd.DataFrame(processed_data)
#remove original column (if needed)
del df["entities"]

然后保存为csv

df.to_csv(r"./out_folder/out_data.csv")

相关问题 更多 >

    热门问题