无论我怎么做，BeautifulSoup4都找不到表

sauce = urllib.request.urlopen('https://www.hockey-reference.com/players/a/abdelju01/gamelog/2014', timeout=None).read() soup = bs.BeautifulSoup(sauce, 'html5lib') table = soup.find_all('table') print(len(table))

3条回答

网友

1楼 · 编辑于 2024-09-29 17:15:21

因为javascript加载了额外的信息

现在requests_html可以加载html页面和javascript内容。你知道吗

pip install requests-html

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.hockey-reference.com/players/a/abdelju01/gamelog/2014')
r.html.render()
res = r.html.find('table')
print(len(res))
4

网友

2楼 · 编辑于 2024-09-29 17:15:21

看起来该表是一个小部件-单击“共享更多”->；“嵌入此表”，您将得到一个带有以下链接的脚本：

https://widgets.sports-reference.com/wg.fcgi?css=1&site=hr&url=%2Fplayers%2Fa%2Fabdelju01%2Fgamelog%2F2014&div=div_gamelog_playoffs

我们如何解析它？你知道吗

import requests
import bs4
url = 'https://widgets.sports-reference.com/wg.fcgi?css=1&site=hr&url=%2Fplayers%2Fa%2Fabdelju01%2Fgamelog%2F2014&div=div_gamelog_playoffs'
widget = requests.get(url).text
fixed = '\n'.join(s.lstrip("document.write('").rstrip("');") for s in widget.splitlines())

soup = bs4.BeautifulSoup(fixed)
soup.find('td', {'data-stat': "date_game"}).text # => '2014-04-18'

瞧！你知道吗

网友

3楼 · 编辑于 2024-09-29 17:15:21

第二个表似乎位于HTML注释标记<--... <table class=...中。我想这就是为什么美团找不到它。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

无论我怎么做，BeautifulSoup4都找不到表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >