解析JSON行fi

{"id": "someID1.docx", "language": {"detected": "cs"}, "title": "Name - Title - FileName", "text": "Long string of text", "entities": [ {"standardForm": "Svářečský průkaz", "type": "car"}, {"standardForm": "email1@gmail.com", "type": "email"}, {"standardForm": "english", "type": "languages"}, {"standardForm": "Práce na PC", "type": "abilities"}, {"standardForm": "MS Office", "type": "abilities"}, {"standardForm": "Automechanik", "type": "education"}, {"standardForm": "Střední průmyslová škola", "type": "education"}, {"standardForm": "Angličtina-Němčina", "type": "languages"}, {"standardForm": "mechanic", "type": "position"}, {"standardForm": "Praha", "type": "region"}, {"standardForm": "B2 - středně pokročilý", "type": "en_level"}, {"standardForm": "Skupina B", "type": "drivinglicense"} ]} {"id": "someID2.pdf", "language": {"detected": "cs"}, "title": "Name - Title - FileName2", "text": "Long string of text2", "entities": [ {"standardForm": "german", "type": "languages"}, {"standardForm": "high school", "type": "education"}, {"standardForm": "Angličtina-Němčina", "type": "languages"}, {"standardForm": "driver", "type": "position"}, {"standardForm": "english", "type": "languages"}, {"standardForm": "university", "type": "education"}, {"standardForm": "email2@seznam.cz", "type": "email"}, {"standardForm": "Středočeský", "type": "region"}, {"standardForm": "Střední", "type": "edulevel"}, {"standardForm": "manager", "type": "lastposition"}, {"standardForm": "? – nerozpoznáno", "type": "de_level"}, {"standardForm": "? – nerozpoznáno", "type": "en_level"}, {"standardForm": "Skupina C", "type": "drivinglicense"} ]} ...

ID;title;languages;education someID1.docx;Name-Title-FileName;english,Angličtina-Němčina;Automechanik;Střední Prům. škola seomeID2.pdf;Name-Title-FileName2; german,Angličtina-Němčina,english;high school, university

2条回答

网友

1楼 · 编辑于 2024-10-17 06:18:42

用miller（https://github.com/johnkerl/miller/releases/tag/5.4.0），简单地

mlr j2c unsparsify then cut -x -r -f "entit" input.json >output.csv

你有这个CSV

id,language:detected,title,text
someID1.docx,cs,Name - Title - FileName,Long string of text
someID2.pdf,cs,Name - Title - FileName2,Long string of text2

关于选项的一些注意事项：

j2c将json转换为csv
unsparsify在所有输入记录上打印字段名并集的记录
cut -x -r -f从JSON中删除entities对象

网友

2楼 · 编辑于 2024-10-17 06:18:42

你已经可以雇佣它的熊猫了

df = pd.DataFrame(jsonfile)
df['languages'] = df.apply(lambda x: [item['standardForm'] 
                                      for item in x.entities 
                                      if item['type'] == 'languages'], 
                           axis=1)
df['education'] = df.apply(lambda x: [item['standardForm'] 
                                      for item in x.entities 
                                      if item['type'] == 'education'],
                           axis=1)


df.to_csv(<filename>, columns=['id', 'title', 'languages', 'education'])

相关问题更多 >

编程相关推荐

热门问题

热门文章