我在抓取时无法获取id（python）

<div id="live-table"> <div class="event mobile event--summary"> <div elementtiming="SpeedCurveFRP" class="leagues--static event--leagues summary-results"> <div class="sportName tennis"> <div id="g_2_ldRHDOEp" title="Clicca per i dettagli dell'incontro!" class="event__matchevent__match--static event__match--twoLine"> ...

import urllib.request, urllib.error, urllib.parse from bs4 import BeautifulSoup url = '...' response = urllib.request.urlopen(url) webContent = response.read() soup = BeautifulSoup(webContent, 'html.parser') list = [] list = soup.find_all("div") total_id = " " for i in list : id = i.get('id') total_id = total_id + "\n" + str(id) print(total_id)

1条回答

网友

1楼 · 发布于 2024-10-03 00:31:42

首先，^{}和^{}是内置函数，所以不要将它们用作变量名

网站是动态加载的，因此requests不支持它。我们可以使用Selenium作为另一种刮取页面的方法

安装时使用：pip install selenium

从here下载正确的ChromeDriver

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

URL = "https://www.flashscore.it/giocatore/djokovic-novak/AZg49Et9/"
driver = webdriver.Chrome(r"C:\path\to\chromedriver.exe")
driver.get(URL)
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")

for tag in soup.find_all("div", id="g_2_ldRHDOEp"):
    print(tag.get_text(separator=" "))

driver.quit()

输出：

30.10. 12:05 Djokovic N. (Srb) Sonego L. (Ita) 0 2 2 6 1 6 P

相关问题更多 >

编程相关推荐

热门问题

热门文章