匹配HTML中的特定表，BeautifulSoup

2条回答

网友

1楼 · 编辑于 2024-10-02 18:19:33

看起来这是xpath的工作。但是，beauthoulsoup不支持XPath表达式。在

考虑切换到lxml或scrapy。在

仅供参考，对于测试xml，例如：

<html>
<h2 class="tabellen_ueberschrift al">Points</h2>  
<div class="fl" style="width:49%;">   
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">a</table>
</div>

<h2 class="tabellen_ueberschrift al">Illegal</h2>
<div class="fl" style="width:49%;">     
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">b</table>
</div>
</html>

查找div中h2=“Points”后具有“tabelle_grafik lh”类的表的XPath表达式是：

^{pr2}$

网友

2楼 · 编辑于 2024-10-02 18:19:33

这对我有用。找到“previousSiblings”，如果您在h2标记前面找到一个文本为“Points”的h2，那么您就找到了一个很好的表

from BeautifulSoup import BeautifulSoup

t="""
<h2 class="tabellen_ueberschrift al">Points</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>yes me!</th></td></table>
<h2 class="tabellen_ueberschrift al">Bad</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>woo woo</td></th></table>
"""

soup = BeautifulSoup(t)

for ta in soup.findAll('table'):
    for s in ta.findPreviousSiblings():
        if s.name == u'h2':
            if s.text == u'Points':
                print ta 
            else:
                break;

相关问题更多 >

编程相关推荐

热门问题

热门文章

匹配HTML中的特定表，BeautifulSoup

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >