将巨大的JSON文件转换为csv-fi

{"Gid": "5999043768223797248", "rights": [{"grantorContext": "Freemium right added by Netlife", "sku": "CMO-STO-2-FREE", "rightId": "5340e29a6dc01000", "grantorId": "NETLIFE_B2C"}], "used_quota": "16.95", "creationtime": "2001-04-29 12:58:33", "devices": [{"last_connection": "2001-05-30 22:06:08", "os_version": "4.2.2", "auto_upload": "wifi", "last_upload": "2002-04-29 13:12:26", "device_name": "i-mobile i-STYLE 7.5", "platform": "unknow", "client_version": "2.0.0"}], "total_quota": 2.0, "Uid": ["666927729520"]} {"Gid": "5999043740151320576", "rights": [{"grantorContext": "Freemium right added by Netlife", "sku": "CMO-STO-2-FREE", "rightId": "5340e29f72c05000", "grantorId": "NETLIFE_B2C"}, {"grantorContext": null, "sku": "CMO-STO-25-M", "rightId": "53b5d2d8b0400000", "grantorId": "DTN"}], "used_quota": "480.85", "creationtime": "2001-04-29 12:58:38", "devices": [{"last_connection": "2001-08-02 03:46:05", "os_version": "8.4", "auto_upload": "wifi", "last_upload": "2015-08-02 03:46:05", "device_name": "Nokia", "platform": "unknow", "client_version": "1.0.0"}], "total_quota": 27.0, "Uid": ["465949097714"]} {"Gid": "5999043675907166208", "rights": [{"grantorContext": null, "sku": "CMO-STO-25-M", "rightId": "53b5d2e161000000", "grantorId": "DTN"}, {"grantorContext": "Freemium right added by Netlife", "sku": "CMO-STO-2-FREE", "rightId": "5340e29b42805000", "grantorId": "NETLIFE_B2C"}], "used_quota": "8.26", "creationtime": "2001-04-29 12:58:35", "devices": [{"last_connection": "2001-04-29 13:08:24", "os_version": "4.2.2", "auto_upload": "wifi", "last_upload": "2002-04-29 13:03:25", "device_name": "Nokia V797", "platform": "unknow", "client_version": "2.0.0"}], "total_quota": 27.0, "Uid": ["666994575443"]}

3条回答

网友

1楼 · 编辑于 2024-06-26 14:07:24

这里有一个使用re.split()的稍微不那么暴力的方法，我已经测试过，在一个拥有8Gb内存的core-i3上，以下操作只需几秒钟

huge="{" + "a"*1000 + "}\n"
huge = huge * 500000
len(huge)/1000000.0
# get 501.5 (Mbytes)
jsons =  re.split(r'\}\s*\{', huge)
len(s)
# get 500000, took about 2 seconds
del huge  # might be a good idea to free half a gigabyte asap.

split生成单独的JSON元素，每个元素都在自己的字符串中，减去左大括号（第一个除外）和右大括号（最后一个除外）。所以剩下的工作（未经测试）将是

^{pr2}$

网友

2楼 · 编辑于 2024-06-26 14:07:24

这是一种强力方法，当文件足够小，可以作为单个字符串处理而不会耗尽内存时

import json
import re

multijsons = open('file.json','r').read()

sep = re.compile( r'\}\s*\{' )
jsonlist = '[' + re.sub( sep, '}, {', multijsons ) + ']'

load =  json.loads( jsonlist)

# quick debug:
for item in load:
    print item
    print '\n -\n'

这是我使用的，因为我的文件不是很大。考虑过编码上面，但不需要它，它看起来似乎是棘手的混乱

网友

3楼 · 编辑于 2024-06-26 14:07:24

我自己也有类似的档案！在

它不是有效的JSON文件。它是一组JSON文件合并成一个文件。从Python身上json.dumps文件文件

Unlike pickle and marshal, JSON is not a framed protocol, so trying toserialize multiple objects with repeated calls to dump() using thesame fp will result in an invalid JSON file

也就是说，即使它可以同时放入内存中，也不能使用json.load文件要读取此文件，除非在最后编辑第一行和“”之前的“[]”和每个元素之间的逗号（即在“}”和“{”/p>之间的空行中。

您可以使用pythonjson模块，让它做我想您想做的事情，以python dict的形式顺序读取每组七个元素，键为“Gid”“rights”等

您必须使用jsondcoder类的raw_decode方法，它将在结束的“}”处停止，并将索引返回到它正在扫描的字符串中，这样您就可以切掉刚刚处理的内容。在

所以，读读这个大文件的一大块，然后在异常处理程序中尝试对一个元素进行原始解码。如果成功，保存解码结果，删除解码成功的部分，然后重复。如果出现异常，请从文件中读取另一个块，并将其附加到正在解码的字符串中，然后重复。如果仍然出现异常，则表明您所在的JSON元素已损坏（或超过块大小，或者无法正确处理文件结尾）。在

如果你的文件少于几十（几百？）的话，编写代码就容易得多兆字节。然后，只需将整个内容读入一个字符串，并开始从前面大量删除JSON元素，直到只剩下空白或遇到解码器错误为止。在

相关问题更多 >

编程相关推荐

热门问题

热门文章