从json.file中提取信息,其中字段在不同的dict中位于不同的位置

2024-04-30 22:33:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我从python 3.8中的一个嵌套json.file中提取大量dict,并得到以下关键错误:

extended_tweet=data[str(i)]['extended_tweet']['full_text'] KeyError:“扩展的_tweet”

如何在嵌套的json.files中搜索隐藏在每个dict中不同结构中的字段?我认为我定义字段的僵化方式妨碍了正确的输出,但我不知道如何修复它

for i in data:
    date = data[str(i)]['created_at']
    account = data[str(i)]['user']['name']
    location = data[str(i)]['user']['location']
    truncated = data[str(i)]['truncated']
    tweet = data[str(i)]['text']
    extended_tweet = data[str(i)]['extended_tweet']['full_text']
    retweeted_status = data[str(i)]['retweeted_status']['extended_tweet']['full_text']
    if truncated == 'True':
        print(truncated, date, account, location, extended_tweet)
    elif 'RT' in tweet:
        print(truncated, date, account, location, retweeted_status)
    else:
        print(truncated, date, account, location, tweet)

下面是我的json.file中的一个dict示例。数字“3”代表dict,我需要从字段extended_tweet.full_text获取数据。每个pathfinder都会显示路径x.extended_tweet.full_text。但是如果我使用这个,我会得到上面显示的错误

"3": {
  "created_at": "time",
  "id": id,
  "id_str": "id",
  "text": "text",
  "display_text_range": [
   0,
   140
  ],
  "source": "",
  "truncated": true,
  "in_reply_to_status_id": null,
  "in_reply_to_status_id_str": null,
  "in_reply_to_user_id": null,
  "in_reply_to_user_id_str": null,
  "in_reply_to_screen_name": null,
  "user": {
   "id": ,
   "id_str": "",
   "name": "",
   "screen_name": "name",
   "location": "location",
   "url": "url",
   "description": "description",
   "translator_type": "none",
   "derived": {
    "locations": [
     {
      "country": "country",
      "country_code": "land",
      "locality": "locality",
      "region": "region",
      "full_name": "full_name",
      "geo": {
       "coordinates": [
        number,
        number
       ],
       "type": "point"
      }
     }
    ]
   },
   "protected": false,
   "verified": true,
   "followers_count": number,
   "friends_count": number,
   "listed_count": number,
   "favourites_count": number,
   "statuses_count": number,
   "created_at": "time",
   "utc_offset": null,
   "time_zone": null,
   "geo_enabled": false,
   "lang": null,
   "contributors_enabled": false,
   "is_translator": false,
   "profile_background_color": "number",
   "profile_background_image_url": "gif",
   "profile_background_image_url_https": "link",
   "profile_background_tile": true,
   "profile_link_color": "607696",
   "profile_sidebar_border_color": "FFFFFF",
   "profile_sidebar_fill_color": "EFEFEF",
   "profile_text_color": "333333",
   "profile_use_background_image": true,
   "profile_image_url": "link",
   "profile_image_url_https": "link",
   "profile_banner_url": "bannerurl",
   "default_profile": false,
   "default_profile_image": false,
   "following": null,
   "follow_request_sent": null,
   "notifications": null
  },
  "geo": null,
  "coordinates": null,
  "place": null,
  "contributors": null,
  "is_quote_status": false,
  "extended_tweet": {
   "full_text": "full_text",

Tags: textnameinidextendedurlnumberdata
1条回答
网友
1楼 · 发布于 2024-04-30 22:33:12

Hi tester:)我将您的JSON示例放在一个文件中,在各个字段中放入一些值,并添加了一个retweeted_status对象,然后基本上像这样运行您的代码:

import json
import os

with open( os.path.join(os.path.realpath('.'), 'src/test/x.json') ) as file1:
    data = json.load(file1)

for i in data:
    date = data[str(i)]['created_at']
    account = data[str(i)]['user']['name']
    location = data[str(i)]['user']['location']
    truncated = data[str(i)]['truncated']
    tweet = data[str(i)]['text']
    extended_tweet = data[str(i)]['extended_tweet']['full_text']
    retweeted_status = data[str(i)]['retweeted_status']['extended_tweet']['full_text']
    if truncated == 'True':
        print(truncated, date, account, location, extended_tweet)
    elif 'RT' in tweet:
        print(truncated, date, account, location, retweeted_status)
    else:
        print(truncated, date, account, location, tweet)

适合我和印刷品:

True time  location text

以下是我放在文件中的JSON:

{"3": {
    "created_at": "time",
    "id": 1234,
    "id_str": "id",
    "text": "text",
    "display_text_range": [
     0,
     140
    ],
    "source": "",
    "truncated": true,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    "user": {
     "id": 1234,
     "id_str": "",
     "name": "",
     "screen_name": "name",
     "location": "location",
     "url": "url",
     "description": "description",
     "translator_type": "none",
     "derived": {
      "locations": [
       {
        "country": "country",
        "country_code": "land",
        "locality": "locality",
        "region": "region",
        "full_name": "full_name",
        "geo": {
         "coordinates": [
          100,
          100
         ],
         "type": "point"
        }
       }
      ]
     },
     "protected": false,
     "verified": true,
     "followers_count": 100,
     "friends_count": 100,
     "listed_count": 100,
     "favourites_count": 100,
     "statuses_count": 100,
     "created_at": "time",
     "utc_offset": null,
     "time_zone": null,
     "geo_enabled": false,
     "lang": null,
     "contributors_enabled": false,
     "is_translator": false,
     "profile_background_color": "number",
     "profile_background_image_url": "gif",
     "profile_background_image_url_https": "link",
     "profile_background_tile": true,
     "profile_link_color": "607696",
     "profile_sidebar_border_color": "FFFFFF",
     "profile_sidebar_fill_color": "EFEFEF",
     "profile_text_color": "333333",
     "profile_use_background_image": true,
     "profile_image_url": "link",
     "profile_image_url_https": "link",
     "profile_banner_url": "bannerurl",
     "default_profile": false,
     "default_profile_image": false,
     "following": null,
     "follow_request_sent": null,
     "notifications": null
    },
    "geo": null,
    "coordinates": null,
    "place": null,
    "contributors": null,
    "is_quote_status": false,
    "extended_tweet": {
     "full_text": "full_text"
    },
    "retweeted_status": {
        "extended_tweet": {
            "full_text": "full_text"
        }
       }
   }}

从完整的数据来看,很明显,有时元素并不存在。在不使用异常的情况下处理缺少的键的方法是使用dictget方法。如果缺少密钥,此方法允许返回默认值。下面的代码处理扩展和转发tweet中缺少的元素,而不会导致异常,并将打印缺少的内容。此代码处理数据中的所有499条推文

full_tweet = data[str(i)]
extended_tweet = full_tweet.get('extended_tweet', 'extended_tweet missing')
if extended_tweet != 'extended_tweet missing':
    extended_tweet = extended_tweet.get('full_text', 'full_text missing')
retweeted_status = full_tweet.get('retweeted_status', 'retweeted_status missing')
if retweeted_status != 'retweeted_status missing':
    retweeted_status = retweeted_status.get('extended_tweet', 'extended_tweet missing')
    if retweeted_status != 'extended_tweet missing':
        retweeted_status = retweeted_status['full_text']

相关问题 更多 >