我正试图从这个page上的每个产品中提取图像url,但出现以下错误:
Traceback (most recent call last): File "D:\Documentos\ZalandoDiscountGen-main\Zalando discout gen\scrapersnipes.py", line 98, in <module> scraper() File "D:\Documentos\ZalandoDiscountGen-main\Zalando discout gen\scrapersnipes.py", line 92, in scraper imagen = producto.find("img", {"class": "b-dynamic_image_content b-product-tile-image ls-is-cached h- lazyloaded"})['src'] TypeError: 'NoneType' object is not subscriptable
我尝试过的代码:
from bs4 import BeautifulSoup
from dhooks import Webhook, Embed
import requests
import pandas as pd
import time, datetime
import random
import numpy as np
import os
headers = {
'authority': 'www.snipes.es',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0',
}
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'es-ES,es;q=0.9,en;q=0.8,de;q=0.7,eo;q=0.6',
'dnt': '1',
}
def scraper():
response = requests.get("https://www.snipes.es/c/shoes?q=jordan%2B1&openCategory=true&sz=all&srule=New", headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
listadoproductos = soup.find_all('div', {'class': 'b-product-grid-tile js-tile-container'})
for producto in listadoproductos:
marca = producto.find("span", {"class":"b-product-tile-brand b-product-tile-text js-product-tile-link"}).text
titulo = producto.find("span", {"class":"b-product-tile-link js-product-tile-link"}).text
precio = producto.find("span", {"class":"b-product-tile-price-item"}).text
imagen = producto.find("img", {"class": "b-dynamic_image_content b-product-tile-image ls-is-cached h-lazyloaded"})['src']
imagen2 = "https://www.snipes.es" + str(imagen)
print (marca.strip(), titulo.strip(), precio.strip(), imagen2)
scraper()
无法找出哪里出了问题,很高兴得到从哪里开始的提示
会发生什么
您试图找到包含多个类的
<img>
,这种方法不起作用,也不必要我想你也不会有
src
因为它是一个空白的png,你可能想要的是data-src
如何解决这个问题
将尝试查找图像的行更改为以下行:
也可以跳过图像2,您不需要它:
输出
相关问题 更多 >
编程相关推荐