<p><code>json_normalize</code>方法可以被传递一个元数据数组以添加到每个记录中。你知道吗</p>
<p>在这里,假设js包含来自原始json的数据,您可以使用:</p>
<pre><code>df = json_normalize(js, 'section_details',['FileName', '_id'])
</code></pre>
<p>您将获得:</p>
<pre><code> FileName _id content heading
0 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew33'} Efficient Algorithms for Non-convex Isotonic R... title
1 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew33'} We consider the minimization of submodular fu... abstract
2 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew33'} subject
3 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew33'} Introduction to convex optimizationwith mean Content
4 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew11'} Text-Adaptive Generative Adversarial Networks:... title
5 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew11'} This paper addresses the problem of manipulati... abstract
6 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew11'} subject
7 32252652D.article.0018038745057751440210.tmp {'$oid': '5ced0669acd01707cbf2ew11'} Introduction to Text-Adaptive Generative Adve... Content
</code></pre>
<p>之后,您仍然需要修复<code>_id</code>列并透视数据帧。最后,你可以以:</p>
<pre><code># extract relevant infos
df = json_normalize(js, 'section_details',['FileName', '_id'])
# fix _id column
df['_id'] = df['_id'].apply(lambda x: x['$oid'])
# pivot to get back the expected columns
resul = df.groupby('FileName').apply(lambda x: x.pivot(
'_id', 'heading', 'content')).reset_index().rename_axis('', axis=1)
</code></pre>
<p>或者,您可以直接从原始json的每一行手工构建一个dataframe行:</p>
<pre><code>resul = pd.DataFrame([dict([('FileName',j['FileName']), ('_id', j['_id']['$oid'])]
+list({sd['heading']: sd['content'] for sd in j['section_details']
}.items())) for j in js]).reindex(columns=['FileName',
'_id', 'title', 'abstract', 'subject', 'Content']
</code></pre>