使用beauthulsoup解析多个层

<div id="fbbuzzresult" class.....> <div class="postbuzz"> .... </div> <div class="linkbuzz">...</div> <div class="descriptionbuzz">...</div> <div class="metabuzz> <div class="time">...</div> <div> <div class="postbuzz"> .... </div> <div class="postbuzz"> .... </div> <div class="postbuzz"> .... </div> </div>

2条回答

网友

1楼 · 编辑于 2024-09-29 20:16:00

您应该能够以与您的父项相同的方式使用结果soup：

from BeautifulSoup import BeautifulSoup as bs
soup = bs(html)
div = soup.find("div",{"id":"fbbuzzresult"})
post_buzz = div.findAll("div",{"class":"postbuzz"})

但在这样做之前，我遇到了一些错误，因此作为第二种方法，您可以做一种sub_soup：

^{pr2}$

网友

2楼 · 编辑于 2024-09-29 20:16:00

首先阅读BeautifulSoup文档http://www.crummy.com/software/BeautifulSoup/bs4/doc/

第二，这里有一个小例子可以让你走得更远：

from bs4 import BeautifulSoup as bs

soup = bs(your_html_content)

# for fbbuzzresult
buzz = soup.findAll("div", {"id" : "fbbuzzresult"})[0]

# to get postbuzz
pbuzz = buzz.findAll("div", {"class" : "postbuzz"})

"""pbuzz is now an array with the postbuzz divs
   so now you can iterate through them, get
   the contents, keep traversing the DOM with BS 
   or do whatever you are trying to do

   So say you want the text from an element, you
   would just do: the_element.contents[0]. However
   if I'm remembering correctly you have to traverse 
   down through all of it's children to get the text.
"""

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用beauthulsoup解析多个层

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >