查找方法未提取已存在的div标记

from selenium import webdriver from bs4 import BeautifulSoup link = "https://www.liquimoly-hbl.de/en/import/games/season-2020-2021/bundesliga/21--gameday--rhein-neckar-loewen---eulen-ludwigshafen/" driver = webdriver.Chrome("path-to-my-chromedrivers") driver.get(link) driver.switch_to.frame("iframe-23400665") page_source = driver.page_source driver.close() soup = BeautifulSoup(page_source, "html5lib") a = soup.find("div", {"class": "srl-tabs-wrapper srl-flex-child"}).find("div", {"srl-tabs srl-flex"}).find("div", {'class': "srl-tabs-content-wrapper srl-flex-child"}).find("div", {"class": "srl-tabs-content"}) print(a.find("div", {"class": "srl-tab srl-tab-handball-playerstats sr-widget sr-widget-level-0 sr-handball-playerstats sr-normal"}))

driver = webdriver.Chrome('path-to-my-chromedrivers') driver.get(link) driver.switch_to.frame("iframe-23400665") page_source = driver.page_source x = driver.find_elements_by_xpath("//div[@class='srl-tab']") for i in x: if i.get_attribute("data-widget") == "handball.playerstats": print(i.get_attribute("class")) print(i.get_attribute("data-widget")) driver.execute_script("arguments[0].click();", i) print(i.get_attribute("class")) page_source_2 = driver.page_source break driver.close()

3条回答

网友

1楼 · 编辑于 2024-09-27 09:37:31

您可以使用pandas库中的read_html（）函数从网页中提取表格。它可以有效地刮取网页并从页面中提取表格。您甚至可以手动生成值或直接将其保存到csv文件中

获取有关pandashere的更多信息或阅读有关该特定函数here的更多信息

希望它能帮助你的用例

网友

2楼 · 编辑于 2024-09-27 09:37:31

转到第页，单击弹出窗口，移动到iframe，单击统计信息并等待。将源代码传递给Beautifulsoup并执行您想要的操作

driver.get("https://www.liquimoly-hbl.de/en/import/games/season-2020-2021/bundesliga/21 gameday rhein-neckar-loewen -eulen-ludwigshafen/")
driver.find_element(By.XPATH,"//button[.='ALLE AKZEPTIEREN']").click()
driver.switch_to.frame("iframe-23400665")
driver.find_element(By.XPATH,"//div[.='Statistics']").click()
time.sleep(5)
html=driver.page_source
soup=BeautifulSoup(html,'html.parser')
div=soup.select_one("div.sr-table-content")
print(div)

进口

from selenium.webdriver.common.by import By
from time import sleep

网友

3楼 · 编辑于 2024-09-27 09:37:31

请尝试此代码一次只需编辑类名和其他URL即可获得该表

# import libraries
import urllib2
from bs4 import BeautifulSoup
# specify the url
quote_page = ‘http://www.bloomberg.com/quote/SPX:IND'
# query the website and return the html to the variable ‘page’
page = urllib2.urlopen(quote_page)
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, ‘html.parser’)
# Take out the <div> of name and get its value
name_box = soup.find(‘h1’, attrs={‘class’: ‘name’})
name = name_box.text.strip() # strip() is used to remove starting and trailing
print name

我在这里留下一些屏幕截图，请参考它对我有用

Image1 Image2

相关问题更多 >

编程相关推荐

热门问题

热门文章