<p>根据<code>pd.json_normalize()</code>的<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html" rel="nofollow noreferrer">official docs</a>,它假定一个数组(列表)输入。然而,原始的json与dict列表不同,最重要的是,键“id”不存在。因此,我认为绝对需要一个手工制作的解析器</p>
<p><strong>代码</strong>:</p>
<pre><code>import pandas as pd
import json
file_path = "/mnt/ramdisk/in.json"
with open(file_path) as f:
dic = json.load(f)
# discard the redundant "report" layer
dic = dic["reports"]
# produce a flattened list of dict
ls = []
for k1, v1 in dic.items():
# k1 = model
for k2, v2 in v1.items():
# k2 = the hash-like id
v2["model"] = k1
v2["id"] = k2
ls.append(v2)
df = pd.json_normalize(ls)
</code></pre>
<p><strong>输出</strong></p>
<pre><code># Trim the message for printing purpose
df2 = df.copy()
df2["message"] = df["message"].apply(lambda s: s[:10])
df2
Out[28]:
message timestamp model id
0 04 Oct 202 1601825117067 Google-Pixel 2 XL -MIoCtD9YUF2G9Esfrfz
1 04 Oct 202 1601825117216 Google-Pixel 2 XL -MIoCtFVOxu8wdEHtm6q
2 04 Oct 202 1601825137685 Google-Pixel 2 XL -MIoCyBtKMQqQzUHEXsW
3 04 Oct 202 1601825807693 Google-Pixel 2 XL -MIoFWll9r3qwzWNoGMn
4 04 Oct 202 1601825677653 Vivo 1820 -MIoF14JUm6JMZrOzDlL
5 04 Oct 202 1601825678026 Vivo 1820 -MIoF1A9ZZNqTu5W-rQD
6 04 Oct 202 1601825684248 Vivo 1820 -MIoF2gNDua9FfLBTg6q
</code></pre>
<p>注意:进入像散列一样的<code>id</code>所在的层的深度似乎是必要的。这是因为<code>id</code>最初是<code>keys</code>,但似乎必须将它们重新格式化为<code>values</code>,才能被<code>pd.json_normalize</code>正确地解释为值。我在互联网上的简单调查也没有发现使用简单的内置方法解析这种递归结构的例子</p>