<h3>具有<code>object_hook</code></h3>的解
<pre><code>import json
def json_load_byteified(file_handle):
return _byteify(
json.load(file_handle, object_hook=_byteify),
ignore_dicts=True
)
def json_loads_byteified(json_text):
return _byteify(
json.loads(json_text, object_hook=_byteify),
ignore_dicts=True
)
def _byteify(data, ignore_dicts = False):
# if this is a unicode string, return its string representation
if isinstance(data, unicode):
return data.encode('utf-8')
# if this is a list of values, return list of byteified values
if isinstance(data, list):
return [ _byteify(item, ignore_dicts=True) for item in data ]
# if this is a dictionary, return dictionary of byteified keys and values
# but only if we haven't already byteified it
if isinstance(data, dict) and not ignore_dicts:
return {
_byteify(key, ignore_dicts=True): _byteify(value, ignore_dicts=True)
for key, value in data.iteritems()
}
# if it's anything else, return it in its original form
return data
</code></pre>
<p>示例用法:</p>
<pre><code>>>> <b><i>json_loads_byteified('{"Hello": "World"}')</i></b>
{'Hello': 'World'}
>>> <b><i>json_loads_byteified('"I am a top-level string"')</i></b>
'I am a top-level string'
>>> <b><i>json_loads_byteified('7')</i></b>
7
>>> <b><i>json_loads_byteified('["I am inside a list"]')</i></b>
['I am inside a list']
>>> <b><i>json_loads_byteified('[[[[[[[["I am inside a big nest of lists"]]]]]]]]')</i></b>
[[[[[[[['I am inside a big nest of lists']]]]]]]]
>>> <b><i>json_loads_byteified('{"foo": "bar", "things": [7, {"qux": "baz", "moo": {"cow": ["milk"]}}]}')</i></b>
{'things': [7, {'qux': 'baz', 'moo': {'cow': ['milk']}}], 'foo': 'bar'}
>>> <b><i>json_load_byteified(open('somefile.json'))</i></b>
{'more json': 'from a file'}</code></pre>
<h3>这个怎么用?我为什么要用它?</h3>
<p><a href="https://stackoverflow.com/a/13105359/1709587">Mark Amery's function</a>比这些短而清晰,那么它们有什么意义呢?你为什么要用它们?</p>
<p>纯粹用于<strong>性能</strong>。Mark的答案首先使用unicode字符串对JSON文本进行完全解码,然后通过整个解码值递归,将所有字符串转换为字节字符串。这有两个不良影响:</p>
<ul>
<li>在内存中创建整个解码结构的副本</li>
<li>如果您的JSON对象是<em>真正的</em>深度嵌套(500个级别或更多),那么您将达到Python的最大递归深度</li>
</ul>
<p>这个答案通过使用<code>json.load</code>和<code>json.loads</code>的<code>object_hook</code>参数缓解了这两个性能问题。来自<a href="https://docs.python.org/2/library/json.html#json.load" rel="noreferrer">the docs</a>:</p>
<blockquote>
<p><code>object_hook</code> is an optional function that will be called with the result of any object literal decoded (a <code>dict</code>). The return value of object_hook will be used instead of the <code>dict</code>. This feature can be used to implement custom decoders</p>
</blockquote>
<p>由于词典在其他词典中嵌套了许多深层次,当它们被解码时会传递给<code>object_hook</code><em>,因此我们可以在此时指定其中的任何字符串或列表,避免以后需要深层递归。</p>
<p>Mark的答案不适合作为<code>object_hook</code>使用,因为它递归到嵌套字典中。我们用<code>_byteify</code>的<code>ignore_dicts</code>参数防止这个答案中的递归,除了<code>object_hook</code>向byteify传递一个新的<code>dict</code>时的</em>之外,这个参数始终传递给它。<code>ignore_dicts</code>标志告诉<code>_byteify</code>忽略<code>dict</code>,因为它们已经被指定了。</p>
<p>最后,我们的<code>json_load_byteified</code>和<code>json_loads_byteified</code>实现对<code>json.load</code>或<code>json.loads</code>返回的结果调用<code>_byteify</code>(使用<code>ignore_dicts=True</code>),以处理正在解码的JSON文本在顶层没有<code>dict</code>的情况。</p>