为什么美丽的汤找不到页面元素？

from urllib.request import urlopen from bs4 import BeautifulSoup year = 2021 team = "NYK" team_url = f"https://www.basketball-reference.com/teams/{team}/{year}.html" html = urlopen(team_url) soup = BeautifulSoup(html, 'html.parser') tbl = soup.find('table', {'id': 'team_misc'}) print(tbl)

2条回答

网友

1楼 · 编辑于 2024-10-05 14:25:32

由于您要查找的表位于HTML comment中，因此可能的解决方案是解析这些元素，并在找到匹配的id时返回


from urllib.request import urlopen
from bs4 import BeautifulSoup, Comment #import the Comment object

year = 2021
team = "NYK"
team_url = f"https://www.basketball-reference.com/teams/{team}/{year}.html"
html = urlopen(team_url)
soup = BeautifulSoup(html, 'html.parser')

comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for c in comments:
    ele = BeautifulSoup(c.strip(), 'html.parser')
    if tbl := ele.find("table"):
        if (tbl_id := tbl.get("id")) == "team_misc":
            print(tbl)

网友

2楼 · 编辑于 2024-10-05 14:25:32

这将获取您标识的表。您需要将chromedriver.exe下载到您的目录中或提供正确的路径

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument(" headless")

year = 2021
team = "NYK"
team_url = f"https://www.basketball-reference.com/teams/{team}/{year}.html"
driver = webdriver.Chrome('chromedriver.exe', options=chrome_options)

driver.get(team_url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
tbl = soup.find('table', {'id': 'team_misc'})
print(tbl)

相关问题更多 >

编程相关推荐

热门问题

热门文章

为什么美丽的汤找不到页面元素？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >