如何在BS4(python 3)中按类查找元素

2024-10-06 11:17:55 发布

您现在位置:Python中文网/ 问答频道 /正文

网站:https://ca.cartier.com/en-ca/collections/jewelry/categories.viewall.html

查看每个产品的所有信息,并将其复制到excel文件中进行进一步的图表/分析

我一直在遵循这里的文档:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class

到目前为止,工作一无所获:

import requests
from bs4 import BeautifulSoup
from bs4 import NavigableString

url = "https://ca.cartier.com/en-ca/collections/jewelry/categories.viewall.html"
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

lst =[]


for my_items in soup.find_all("div", attrs={"class": "grid-item"}):

    print(my_items)

Tags: httpsimportcomhtmlrequestscollectionsclassca
1条回答
网友
1楼 · 发布于 2024-10-06 11:17:55

页面是动态加载的,因此requests不支持它。但是,可以通过向以下对象发送GET请求来获取数据:

https://ca.cartier.com/en-ca/collections/jewelry/categories.productlistingservletv2.json

响应数据是一个Python字典(dict),您可以在其中访问key/value

>>> import requests
>>>
>>>
>>> URL = "https://ca.cartier.com/en-ca/collections/jewelry/categories.productlistingservletv2.json"
>>> response = requests.get(URL).json()
>>> print(type(response))
<class 'dict'>

另一种方法是使用Selenium来刮取页面

安装时使用:pip install selenium

here下载正确的ChromeDriver

在您的示例中:

from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup

URL = "https://ca.cartier.com/en-ca/collections/jewelry/categories.viewall.html"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
driver.get(URL)
# Wait for the page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")
driver.quit()

for tag in soup.find_all("div", attrs={"class": "grid-item"}):
    print(tag)

相关问题 更多 >