如何用BeautifulSoup等待一秒钟来保存soup元素，让元素在pag中加载完成

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p' req = requests.get(url) soup = BeautifulSoup(req.text, "lxml") # Muted Price MutedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-listPriceValue ph2 dib strike custom-list-price fw5 exito-vtex-component-precio-tachado'})[0].text MutedPrice=pd.to_numeric(MutedPrice[2-len(MutedPrice):].replace('.','')) # Red Price RedPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-sellingPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-rojo'})[0].text RedPrice=pd.to_numeric(RedPrice[2-len(RedPrice):].replace('.','')) # black Price BlackPrice = soup.find_all("span",{'class':'exito-vtex-components-2-x-alliedPrice fw1 f3 custom-selling-price dib ph2 exito-vtex-component-precio-negro'})[0].text BlackPrice=pd.to_numeric(BlackPrice[2-len(BlackPrice):].replace('.','')) print('Muted Price:',MutedPrice) print('Red Price:',RedPrice) print('Black Price:',BlackPrice)

2条回答

网友

1楼 · 编辑于 2024-05-03 12:43:49

这些值可能是动态呈现的，也就是说，这些值可能由页面中的javascript填充。在

requests.get()只返回从服务器接收到的标记，而不做任何进一步的客户端更改，因此它不是完全关于等待。在

您可以使用Selenium Chrome Webdriver加载页面URL并获取页面源代码。（或者您可以使用Firefox驱动程序）。在

转到chrome://settings/help检查您当前的chrome版本并从here下载该版本的驱动程序。请确保将驱动程序文件保存在PATH或python脚本所在的同一文件夹中。在

尝试将现有代码的前3行替换为：

from contextlib import closing
from selenium.webdriver import Chrome # pip install selenium

url='https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p'

# use Chrome to get page with javascript generated content
with closing(Chrome(executable_path="./chromedriver")) as browser:
     browser.get(url)
     page_source = browser.page_source

soup = BeautifulSoup(page_source, "lxml")

输出：

^{pr2}$

参考文献：

Get page generated with Javascript in Python

selenium - chromedriver executable needs to be in PATH

网友

2楼 · 编辑于 2024-05-03 12:43:49

您尝试抓取的页面包含JavaScript代码，该代码由浏览器执行，并在下载后修改页面。如果要对页面的“最终状态”执行提取，则需要使用专用于该状态的库在页面上运行JavaScript代码。不幸的是，beauthoulsoup没有这个功能，您需要使用另一个库来完成您的任务。在

例如，您可以pip install requests-html并运行以下命令：

#!/usr/bin/env python3

import re
from requests_html import HTMLSession

def parse_price_text(price_text):
    """Extract just the price digits and dots from the <span> tag text"""
    matches = re.search("([\d\.]+)", price_text)
    if not matches:
        raise RuntimeError(f"Could not parse price text: {price_text}")

    return matches.group(1)

# Starting a session and running the JavaScript code with render()
# to make sure the DOM is the same as when using the browser.
session = HTMLSession()
exito_url = "https://www.exito.com/televisor-led-samsung-55-pulgadas-uhd-4k-smart-tv-serie-7-24449/p"
response = session.get(exito_url)
response.html.render()

# Define all price types and their associated CSS class
price_types = {
    "listPrice": "exito-vtex-components-2-x-listPriceValue",
    "sellingPrice": "exito-vtex-components-2-x-sellingPrice",
    "alliedPrice": "exito-vtex-components-2-x-alliedPrice"
}

# Iterate over price types and extract them from the page
for price_type, price_css_class in price_types.items():
    price = parse_price_text(response.html.find(f"span.{price_css_class}", first=True).text)
    print(f"{price_type} price: {price} $")

它打印以下内容：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章