如何避免BeautifulSoup中的“TypeError:“NoneType”对象不可下标”?

2024-09-28 18:55:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从这个page上的每个产品中提取图像url,但出现以下错误:

Traceback (most recent call last):
    File "D:\Documentos\ZalandoDiscountGen-main\Zalando discout gen\scrapersnipes.py", line 98, in 
    <module>
    scraper()
    File "D:\Documentos\ZalandoDiscountGen-main\Zalando discout gen\scrapersnipes.py", line 92, in 
    scraper
    imagen = producto.find("img", {"class": "b-dynamic_image_content b-product-tile-image ls-is-cached h- 
    lazyloaded"})['src']
    TypeError: 'NoneType' object is not subscriptable

我尝试过的代码:

from bs4 import BeautifulSoup
from dhooks import Webhook, Embed
import requests
import pandas as pd
import time, datetime
import random
import numpy as np
import os


headers = {
    'authority': 'www.snipes.es',
    'cache-control': 'max-age=0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0',
    }
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'es-ES,es;q=0.9,en;q=0.8,de;q=0.7,eo;q=0.6',
'dnt': '1',
}


def scraper():

    response = requests.get("https://www.snipes.es/c/shoes?q=jordan%2B1&openCategory=true&sz=all&srule=New", headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    listadoproductos = soup.find_all('div', {'class': 'b-product-grid-tile js-tile-container'})
    for producto in listadoproductos:
        marca = producto.find("span", {"class":"b-product-tile-brand b-product-tile-text js-product-tile-link"}).text
        titulo = producto.find("span", {"class":"b-product-tile-link js-product-tile-link"}).text
        precio = producto.find("span", {"class":"b-product-tile-price-item"}).text
        imagen = producto.find("img", {"class": "b-dynamic_image_content b-product-tile-image ls-is-cached h-lazyloaded"})['src']
        imagen2 = "https://www.snipes.es" + str(imagen)
        print (marca.strip(), titulo.strip(), precio.strip(), imagen2)
    


scraper()

无法找出哪里出了问题,很高兴得到从哪里开始的提示


Tags: textinimageimportessecfetchcontent
1条回答
网友
1楼 · 发布于 2024-09-28 18:55:58

会发生什么

您试图找到包含多个类的<img>,这种方法不起作用,也不必要

我想你也不会有src因为它是一个空白的png,你可能想要的是data-src

如何解决这个问题

将尝试查找图像的行更改为以下行:

imagen = producto.select_one('div.b-product-tile-image-container img')['data-src']

也可以跳过图像2,您不需要它:

for producto in listadoproductos:
    marca = producto.find("span", {"class":"b-product-tile-brand b-product-tile-text js-product-tile-link"}).text
    titulo = producto.find("span", {"class":"b-product-tile-link js-product-tile-link"}).text
    precio = producto.find("span", {"class":"b-product-tile-price-item"}).text
    imagen = producto.select_one('div.b-product-tile-image-container img')['data-src']
    print (marca.strip(), titulo.strip(), precio.strip(), imagen)

输出

JORDAN WMNS Zoom '92 149,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dw1986ce4d/1899597_P.jpg?sw=300&sh=300&sm=fit&sfrm=png JORDAN Air Jordan 1 Mid (PS) 64,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dwe2e88c0b/1930682_P.jpg?sw=300&sh=300&sm=fit&sfrm=png JORDAN Air Jordan 11 Crib Bootie 59,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dw2dd01aa4/1883653_P.jpg?sw=300&sh=300&sm=fit&sfrm=png JORDAN Jordan Air Max 200 129,99 € https://www.snipes.es/dw/image/v2/BDCB_PRD/on/demandware.static/-/Sites-snse-master-eu/default/dw21a7bda8/1829411_P.jpg?sw=300&sh=300&sm=fit&sfrm=png

相关问题 更多 >