我对网络抓取还不熟悉,我正在尝试为pokemoncenter.com网站建立一个非常基本的股票跟踪器。访问live站点上项目的产品页面时,“添加到购物车”按钮显示为:
<button type="button" class="jsx-2748458255 product-add btn btn-secondary">Add to Cart</button>
当项目缺货时,按钮为:
<button type="button" disabled="" class="jsx-2748458255 product-add btn btn-tertiary disabled">Out of Stock</button>
但每当我尝试清理网站时,无论该项目是否有库存,按钮都是:
<button class="jsx-2748458255 product-add btn btn-tertiary disabled" disabled="" type="button"></button>
所以本质上,当我下载带有requests.get()的html代码时,它总是显示为缺货
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request
import requests
page_url = "https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
req = requests.get(page_url, headers = headers)
page_soup = soup(req.text, "html.parser")
#Find add to cart button
divs = page_soup.findAll("div", {"class" : "jsx-829839431 product-col"})
button = str(divs[1].find("button", {"class" : "jsx-2748458255"}))
#Check if button is disabled or not
if (button.find('disabled') != -1):
print("Out of Stock")
else:
print("In Stock")
库存示例:https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in
缺货示例:https://www.pokemoncenter.com/product/701-06558/gigantamax-pikachu-poke-plush-17-in
正如goalie1998所提到的,该站点可以首先使用javascript只加载必要的图像,以减少初始加载时间。您可能仍然可以使用Selenium来删除该网站,因为它可以模仿浏览器行为
相关问题 更多 >
编程相关推荐