Python beatuifulsoup:从div类中提取值

2024-09-30 16:35:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我想建立一个程序,自动获取德国指数(DAX)的实时价格。因此,我使用价格提供者FXCM的website

在我的代码中,我使用beautifulsoup和请求作为包。存储当前值的div框如下所示:

<div class="left" data-item="quoteContainer" data-bg_quotepush="133962:74:bid">
      <div class="wrapper cf">
        <div class="left">
          <span class="quote quote_standard" data-bg_quotepush="quote" data-bg_quotepush_i="133962:74:bid" data-bg_quotepush_f="quote" data-bg_quotepush_c="40">13.599,24</span>
          <span class="label" data-bg_quotepush="time" data-bg_quotepush_i="133962:74:bid" data-bg_quotepush_f="time" data-bg_quotepush_c="41">25.12.2020</span>
          <span class="label"> • </span>
          <span class="label" data-item="currency"></span>
        </div>
        <div class="right">
          <span class="percent up" data-bg_quotepush="percent" data-bg_quotepush_i="133962:74:bid" data-bg_quotepush_f="percent" data-bg_quotepush_c="42">+0,00<span>%</span></span>
          <span class="label up" data-bg_quotepush="change" data-bg_quotepush_i="133962:74:bid" data-bg_quotepush_f="change" data-bg_quotepush_c="43">0,00</span>
        </div>
      </div>
    </div>

我想要的值是data-bg_quotepush_c="40"之后的值,其值为13.599,24

我的Python代码如下所示:

import requests as rq
from bs4 import BeautifulSoup as bs
    
url = "https://news.guidants.com/#Ticker/Profil/?i=133962&e=74"
    
response = rq.get(url)
soup = bs(response.text, "lxml")

price = soup.find_all("div", {"class":"left"})[0].find("span")

print(price["data-bg_quotepush_c"])

它返回以下错误:

File "C:\Users\Felix\anaconda3\lib\site-packages\bs4\element.py", line 1406, in __getitem__ 
return self.attrs[key]

KeyError: 'data-bg_quotepush_c'

Tags: 代码divdatatime价格itemleftlabel
2条回答

如果删除div类的值,请尝试以下示例

driver = webdriver.Chrome(YourPATH to driver)

from bs4 import BeautifulSoup

# create variable to store a url strings
url = 'https://news.guidants.com/#Ticker/Profil/?i=133962&e=74'

driver.get(url)

# scraping proccess

soup = BeautifulSoup(driver.page_source,"html5lib")

# parse
prices = soup.find_all("div", attrs={"class":"left"})

for price in prices:
    total_price = price.find('span')

# close the driver
driver.close()

如果您使用请求模块,请尝试使用不同的解析器 您可以使用pip示例html5lib进行安装

pip install html5lib

谢谢

如果使用动态生成的内容,请使用Selenium而不是请求

发生了什么事?

使用requests请求网站只需提供初始内容,该内容不包含所有动态生成的信息,因此您无法找到您要查找的内容

要等待网站完全加载,请使用Seleniumsleep()作为简单方法,或使用selenium waits作为高级方法

避免错误

使用price.text获取元素的文本,如下所示:

<span class="quote quote_standard" data-bg_quotepush="quote" data-bg_quotepush_c="40" data-bg_quotepush_f="quote" data-bg_quotepush_i="133962:74:bid">13.599,24</span>

示例

from selenium import webdriver
from bs4 import BeautifulSoup

url = "https://news.guidants.com/#Ticker/Profil/?i=133962&e=74"

driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get(url)
driver.implicitly_wait(3) 

soup = BeautifulSoup(driver.page_source,"html5lib")
price = soup.find_all("div", {"class":"left"})[0].find("span")
print(price.text)
driver.close()

输出

13.599,24

相关问题 更多 >