BeautifulSoup sports scraper返回空列表

2024-05-20 21:37:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用Python的BeautifulSoup从this网站上获取网球比赛的结果。我尝试了很多东西,但我总是得到一张空的清单。我犯了一个明显的错误吗?当我检查它时,网站上有多个此类的实例,但它似乎没有找到它

import requests
from bs4 import BeautifulSoup

url = 'https://www.flashscore.com/tennis/atp-singles/french-open/results/'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

match_container = soup.find_all('div', class_='event__match event__match--static event__match--last event__match--twoLine')
print(match_container)

Tags: 实例fromimporteventurl网站containermatch
2条回答

分数数据被动态地拉入页面,并且您只获得带有请求的初始HTML

正如user70在评论中所建议的那样,实现这一点的方法是首先使用Selenium之类的工具,以便获得在web浏览器的检查工具中看到的所有动态内容

网上很少有指南显示这是如何工作的-你可以从这本开始,也许:

https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25

结果表是通过javascript加载的,BeautifulSoup找不到它,因为在解析时还没有加载它。要解决这个问题,您需要使用selenium。这里是chromedriver的链接

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(' headless')
chrome_options.add_argument(' no-sandbox')
chrome_options.add_argument(' disable-dev-shm-usage')
wd = webdriver.Chrome('<PATH_TO_CHROMEDRIVER>',chrome_options=chrome_options)

# load page via selenium
wd.get("https://www.flashscore.com/tennis/atp-singles/french-open/results/")

# wait 5 seconds until results table will be loaded
table = WebDriverWait(wd, 5).until(EC.presence_of_element_located((By.ID, 'live-table')))

# parse content of the grid
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')

# access grid cells, your logic should be here
for tag in soup.find_all('div', class_='event__match event__match static event__match last event__match twoLine'):
  print(tag)

相关问题 更多 >