如何在BS4（python 3）中按类查找元素

import requests from bs4 import BeautifulSoup from bs4 import NavigableString url = "https://ca.cartier.com/en-ca/collections/jewelry/categories.viewall.html" headers = {'User-Agent': 'Mozilla/5.0'} page = requests.get(url) soup = BeautifulSoup(page.text, 'lxml') lst =[] for my_items in soup.find_all("div", attrs={"class": "grid-item"}): print(my_items)

1条回答

网友

1楼 · 发布于 2024-10-06 11:17:55

页面是动态加载的，因此requests不支持它。但是，可以通过向以下对象发送GET请求来获取数据：

https://ca.cartier.com/en-ca/collections/jewelry/categories.productlistingservletv2.json

响应数据是一个Python字典（dict），您可以在其中访问key/value：

>>> import requests
>>>
>>>
>>> URL = "https://ca.cartier.com/en-ca/collections/jewelry/categories.productlistingservletv2.json"
>>> response = requests.get(URL).json()
>>> print(type(response))
<class 'dict'>

另一种方法是使用Selenium来刮取页面

安装时使用：pip install selenium

从here下载正确的ChromeDriver

在您的示例中：

from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup

URL = "https://ca.cartier.com/en-ca/collections/jewelry/categories.viewall.html"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
driver.get(URL)
# Wait for the page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")
driver.quit()

for tag in soup.find_all("div", attrs={"class": "grid-item"}):
    print(tag)

相关问题更多 >

编程相关推荐

热门问题

热门文章