Python BeautifulSoup 提取元素间的文本问题的回答

Python BeautifulSoup 提取元素间的文本

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

了解有关如何导航<a href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#contents">through the parse tree in ^{<cd1>}</a>的详细信息。解析树得到了<code>tags</code>和<code>NavigableStrings</code>（因为这是一个文本）。一个例子 <pre><code>from BeautifulSoup import BeautifulSoup doc = ['<html><head><title>Page title</title></head>', '<body>This is paragraph one.', 'This is paragraph two.', '</html>'] soup = BeautifulSoup(''.join(doc)) print soup.prettify() # <html> # <head> # <title> # Page title # </title> # </head> # <body> # # This is paragraph # # one # # . # # # This is paragraph # # two # # . # # </body> # </html> </code></pre> 要向下移动解析树，有<code>contents</code>和<code>string</code>。 <ul> <li><blockquote> contents is an ordered list of the Tag and NavigableString objects contained within a page element </blockquote></li> <li><blockquote> if a tag has only one child node, and that child node is a string, the child node is made available as tag.string, as well as tag.contents[0] </blockquote></li> </ul> 对于以上，也就是说你可以 <pre><code>soup.b.string # u'one' soup.b.contents[0] # u'one' </code></pre> 对于多个子节点，可以有 <pre><code>pTag = soup.p pTag.contents # [u'This is paragraph ', one, u'.'] </code></pre> 因此，您可以在这里玩<code>contents</code>，并在所需索引处获取内容。 您还可以在标记上迭代，这是一个快捷方式。例如 <pre><code>for i in soup.body: print i # This is paragraph one. # This is paragraph two. </code></pre>

Python BeautifulSoup 提取元素间的文本

1 个回答

相关Python问题