正在尝试web刮取。下载的html代码与实时站点上的代码略有不同

2024-09-30 12:11:43 发布

男 | 程序猿一只，喜欢编程写python代码。

我对网络抓取还不熟悉，我正在尝试为pokemoncenter.com网站建立一个非常基本的股票跟踪器。访问live站点上项目的产品页面时，“添加到购物车”按钮显示为：

<button type="button" class="jsx-2748458255 product-add btn btn-secondary">Add to Cart</button>

当项目缺货时，按钮为：

<button type="button" disabled="" class="jsx-2748458255 product-add btn btn-tertiary disabled">Out of Stock</button>

但每当我尝试清理网站时，无论该项目是否有库存，按钮都是：

<button class="jsx-2748458255 product-add btn btn-tertiary disabled" disabled="" type="button"></button>

所以本质上，当我下载带有requests.get（）的html代码时，它总是显示为缺货

import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request 
import requests
 
page_url = "https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in"

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}

req = requests.get(page_url, headers = headers)

page_soup = soup(req.text, "html.parser")

#Find add to cart button
divs = page_soup.findAll("div", {"class" : "jsx-829839431 product-col"})
button = str(divs[1].find("button", {"class" : "jsx-2748458255"}))


#Check if button is disabled or not
if (button.find('disabled') != -1): 
    print("Out of Stock")
else:
    print("In Stock")

库存示例：https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in
缺货示例：https://www.pokemoncenter.com/product/701-06558/gigantamax-pikachu-poke-plush-17-in

Tags：项目 import com add type page button product

1条回答

网友

1楼 · 发布于 2024-09-30 12:11:43

正如goalie1998所提到的，该站点可以首先使用javascript只加载必要的图像，以减少初始加载时间。您可能仍然可以使用Selenium来删除该网站，因为它可以模仿浏览器行为

正在尝试web刮取。下载的html代码与实时站点上的代码略有不同

相关问题更多 >

编程相关推荐

热门问题

热门文章

正在尝试web刮取。下载的html代码与实时站点上的代码略有不同

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >