从一个嵌套的JSON文件中提取文本，其中每个JSON对象在Python中的条目数可变

{ "coordinates": null, "acoustic_features": { "instrumentalness": "0.00479", "liveness": "0.18", "speechiness": "0.0294", "danceability": "0.634", "valence": "0.342", "loudness": "-8.345", "tempo": "125.044", "acousticness": "0.00035", "energy": "0.697", "mode": "1", "key": "6" }, "artist_id": "b2980c722a1ace7a30303718ce5491d8", "place": null, "geo": null, "tweet_lang": "en", "source": "Share.Radionomy.com", "track_title": "8eeZ", "track_id": "cd52b3e5b51da29e5893dba82a418a4b", "artist_name": "Dominion", "entities": { "hashtags": [{ "text": "nowplaying", "indices": [0, 11] }, { "text": "goth", "indices": [51, 56] }, { "text": "deathrock", "indices": [57, 67] }, { "text": "postpunk", "indices": [68, 77] }], "symbols": [], "user_mentions": [], "urls": [{ "indices": [28, 50], "expanded_url": "cathedral13.com/blog13", "display_url": "cathedral13.com/blog13", "url": "t.co/Tatf4hEVkv" }] }, "created_at": "2014-01-01 05:54:21", "text": "#nowplaying Dominion - 8eeZ Tatf4hEVkv #goth #deathrock #postpunk", "user": { "location": "middle of nowhere", "lang": "en", "time_zone": "Central Time (US & Canada)", "name": "Cathedral 13", "entities": null, "id": 81496937, "description": "I\u2019m a music junkie who is currently responsible for Cathedral 13 internet radio (goth, deathrock, post-punk)which has been online since 06/20/02." }, "id": 418243774842929150 }

import csv with open('jsonpart.json') as data_file: data = json.load(data_file) #print (data) header = ['hashtags'] # open a file for writing data_csv = open('hashtags.csv', 'wb') # create the csv writer object csvwriter = csv.writer(data_csv) # write the csv header csvwriter.writerow(header) for entry in data: csvwriter.writerow([entry['entities']['hashtags']]) data_csv.close()

"[{u'indices': [0, 11], u'text': u'nowplaying'}, {u'indices': [51, 56], u'text': u'goth'}, {u'indices': [57, 67], u'text': u'deathrock'}, {u'indices': [68, 77], u'text': u'postpunk'}]" "[{u'indices': [23, 34], u'text': u'NowPlaying'}, {u'indices': [75, 79], u'text': u'80s'}, {u'indices': [80, 86], u'text': u'Retro'}, {u'indices': [87, 91], u'text': u'Fun'}]" "[{u'indices': [0, 11], u'text': u'nowplaying'}]" "[{u'indices': [54, 65], u'text': u'nowplaying'}, {u'indices': [66, 77], u'text': u'listenlive'}]"

1条回答

网友

1楼 · 发布于 2024-09-30 01:37:21

你可以用一个简单的列表来理解。假设您有一个名为json\u chunk的json对象，您可以这样创建列表：

text_list = [hashtag['text'] for hashtag in json_chunk['entities']['hashtags']]

现在你有一张单子了。迭代它（某些元素显然有一个新行字符，而其他元素没有-所以去掉所有元素并向所有元素添加新行字符），然后将每个元素写入一个文件，如下所示：

with open(r'C:\outputfile.csv', 'a', encoding='utf-8') as fd:
    for line in text_list:
    fd.write(line.strip()+'\n')

相关问题更多 >

编程相关推荐

热门问题

热门文章