Python3从体育网站提取html数据

from bs4 import BeautifulSoup import requests result = requests.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0") src = result.content soup = BeautifulSoup(src, 'html.parser') stats = soup.find("div", {"class": "tab-statistics-0-statistic"}) print(stats)

1条回答

网友

1楼 · 发布于 2024-07-07 08:47:45

由于网站由javascript呈现，可能的选项是使用selenium加载页面，然后使用BeautifulSoup解析：

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

# initialize selenium driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(' headless')
chrome_options.add_argument(' no-sandbox')
chrome_options.add_argument(' disable-dev-shm-usage')
wd = webdriver.Chrome('<<PATH_TO_SELENIUMDRIVER>>', options=chrome_options)

# load page via selenium
wd.get("https://www.scoreboard.com/uk/match/lvbns58C/#match-statistics;0")

# wait 30 seconds until element with class mainGrid will be loaded
table = WebDriverWait(wd, 30).until(EC.presence_of_element_located((By.ID, 'statistics-content')))

# parse content of the table
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'html.parser')

print(soup)

# close selenium driver
wd.quit()

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python3从体育网站提取html数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >