BeautifulSoup sports scraper返回空列表

import requests from bs4 import BeautifulSoup url = 'https://www.flashscore.com/tennis/atp-singles/french-open/results/' page = requests.get(url) soup = BeautifulSoup(page.content, 'html.parser') match_container = soup.find_all('div', class_='event__match event__match--static event__match--last event__match--twoLine') print(match_container)

2条回答

网友

1楼 · 编辑于 2024-05-20 21:37:34

分数数据被动态地拉入页面，并且您只获得带有请求的初始HTML

正如user70在评论中所建议的那样，实现这一点的方法是首先使用Selenium之类的工具，以便获得在web浏览器的检查工具中看到的所有动态内容

网上很少有指南显示这是如何工作的-你可以从这本开始，也许：

https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25

网友

2楼 · 编辑于 2024-05-20 21:37:34

结果表是通过javascript加载的，BeautifulSoup找不到它，因为在解析时还没有加载它。要解决这个问题，您需要使用selenium。这里是chromedriver的链接

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(' headless')
chrome_options.add_argument(' no-sandbox')
chrome_options.add_argument(' disable-dev-shm-usage')
wd = webdriver.Chrome('<PATH_TO_CHROMEDRIVER>',chrome_options=chrome_options)

# load page via selenium
wd.get("https://www.flashscore.com/tennis/atp-singles/french-open/results/")

# wait 5 seconds until results table will be loaded
table = WebDriverWait(wd, 5).until(EC.presence_of_element_located((By.ID, 'live-table')))

# parse content of the grid
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')

# access grid cells, your logic should be here
for tag in soup.find_all('div', class_='event__match event__match static event__match last event__match twoLine'):
  print(tag)

相关问题更多 >

编程相关推荐

热门问题

热门文章