Python靓汤如何得到深层嵌套元素

<div id ="a"> <table> <td>  </td> <td> <table></table> <table></table> <div class="tabber"> <table></table> <table></table>  </div> </td> </table> </div>

1条回答

网友

1楼 · 发布于 2024-10-02 22:29:28

您显示的代码似乎没有反映该页上的任何内容：

没有带有id='a'的div标记。事实上，没有一个标记具有该属性。这就是上一个命令stats_table = ...失败的原因。在

正好有3个div标记的class属性等于tabber，而不是4：

>>> len(soup.find_all('div', class_="tabber"))
3

它们也不是空的：

>>> len(soup.find_all('div', class_="tabber")[1])
7

类tabber中没有一个div标记，它只有2个table子类，但我认为这是因为您大大减少了自己的示例。

如果你想抓取这样一个网站，你不能很容易地用一个唯一的id来选择标签，那么你别无选择，只能帮助自己使用其他属性，比如标签名。有时标记在DOM中的位置相互比较也是一种有用的技术。在

对于您的特定问题，您可以使用title属性来达到最佳效果：

>>> from bs4 import BeautifulSoup
>>> import urllib2
>>> url = 'http://www.soccerstats.com/team.asp?league=england&teamid=24'
>>> soup = BeautifulSoup(urllib2.urlopen(url).read(), 'lxml')
>>> all_stats = soup.find('div', id='team-matches-and stats')
>>> left_column, right_column = [x for x in all_stats.table.tr.children if x.name == 'td']
>>> table1, table2 = [x for x in right_column.children if x.name == 'table']  # the two tables at the top right
>>> [x['title'] for x in right_column.find_all('div', class_='tabbertab')]
['Stats', 'Scores', 'Goal times', 'Overall', 'Home', 'Away']

最后一部分是有趣的部分：右下角的所有表都有title属性，这将使您能够更容易地选择它们。此外，这些属性使标记在soup中是唯一的，因此您可以直接从根中选择它们：

>>> stats_div = soup.find('div', class_="tabbertab", title="Stats")
>>> len(stats_div.find_all('table', class_="stat"))
3

这3项分别对应于“当前连胜”、“得分”和“主客场优势”子项。在

相关问题更多 >

编程相关推荐

热门问题

热门文章