擅长:python、mysql、java
<p>这个页面将所有的表隐藏在注释中,JavaScript使用它来显示表,并可能在显示前进行排序或过滤。在</p>
<p>所有注释都在<code><div class='placeholder'></code>之后,因此您可以使用它来查找此注释,从注释中获取所有文本并使用BS来解析它。在</p>
<pre><code>#!/usr/bin/env python3
#import urllib.request
import requests
from bs4 import BeautifulSoup as BS
url = 'http://www.basketball-reference.com/teams/CHO/2017.html'
#html = urllib.request.urlopen(url)
html = requests.get(url).text
soup = BS(html, 'html.parser')
placeholders = soup.find_all('div', {'class': 'placeholder'})
total_tables = 0
for x in placeholders:
# get elements after placeholder and join in one string
comment = ''.join(x.next_siblings)
# parse comment
soup_comment = BS(comment, 'html.parser')
# search table in comment
tables = soup_comment.find_all('table')
# ... do something with table ...
#print(tables)
total_tables += len(tables)
print('total tables:', total_tables)
</code></pre>
<p>这样我发现了11个隐藏在注释中的表。在</p>