我正在刮的site。我的目标是抓取产品ID/sku并获得链接。但是元素在站点中,当我刮取数据时,我的输出将为空/错误。 当前代码:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
url = "https://www.adidas.com.sg/yeezy/"
productsource = requests.get(url,headers=headers,timeout=15)
productinfo = BeautifulSoup(productsource.content, "lxml")
for item in productinfo.select('div',class_='src-components-___coming-soon__row___NfXc3'):
sku = item.find('div', class_="src-components-___coming-soon__product___2Gai4")['id']
link = item.a['href']
print(sku,'\n',link)
结果:
Traceback (most recent call last):
File "c:\Users\matta\OneDrive\xeonon\testing monitors\test.py", line 14, in <module>
sku = item.find('div', class_="src-components-___coming-soon__product___2Gai4")['id']
TypeError: 'NoneType' object is not subscriptable
有人能帮忙吗?我做错了什么
更新:如何提取第一个url
"imageUrls": [
"https://assets.adidas.com/images/w_840,h_840,q_auto:sensitive/3d37a43625ce413ea6d3ad44013560db_9366/GZ0954_01_standard.jpg",
"https://assets.adidas.com/images/w_840,h_840,q_auto:sensitive/e1748ff26ad54f559ffbad4401356122_9366/GZ0954_01_standard1_hover.jpg",
"https://assets.adidas.com/images/w_840,h_840,q_auto:sensitive/3da89e0f71064a958377ad4401355e12_9366/GZ0954_01_standard2.jpg",
"https://assets.adidas.com/images/w_840,h_840,q_auto:sensitive/43136245b78840e9901bad44013561bf_9366/GZ0954_02_standard.jpg",
"https://assets.adidas.com/images/w_840,h_840,q_auto:sensitive/c116076d86b34098bf9cad4401355ee8_9366/GZ0954_03_standard.jpg"
],
输出:
数据以JavaScript嵌入到页面中。您可以使用以下示例来解析它:
印刷品:
相关问题 更多 >
编程相关推荐