<p>了解有关如何导航<a href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#contents">through the parse tree in ^{<cd1>}</a>的详细信息。解析树得到了<code>tags</code>和<code>NavigableStrings</code>(因为这是一个文本)。一个例子</p>
<pre><code>from BeautifulSoup import BeautifulSoup
doc = ['<html><head><title>Page title</title></head>',
'<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
'<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
'</html>']
soup = BeautifulSoup(''.join(doc))
print soup.prettify()
# <html>
# <head>
# <title>
# Page title
# </title>
# </head>
# <body>
# <p id="firstpara" align="center">
# This is paragraph
# <b>
# one
# </b>
# .
# </p>
# <p id="secondpara" align="blah">
# This is paragraph
# <b>
# two
# </b>
# .
# </p>
# </body>
# </html>
</code></pre>
<p>要向下移动解析树,有<code>contents</code>和<code>string</code>。</p>
<ul>
<li><blockquote>
<p>contents is an ordered list of the Tag and NavigableString objects
contained within a page element</p>
</blockquote></li>
<li><blockquote>
<p>if a tag has only one child node, and that child node is a string,
the child node is made available as tag.string, as well as
tag.contents[0]</p>
</blockquote></li>
</ul>
<p>对于以上,也就是说你可以</p>
<pre><code>soup.b.string
# u'one'
soup.b.contents[0]
# u'one'
</code></pre>
<p>对于多个子节点,可以有</p>
<pre><code>pTag = soup.p
pTag.contents
# [u'This is paragraph ', <b>one</b>, u'.']
</code></pre>
<p><em><strong>因此,您可以在这里玩<code>contents</code>,并在所需索引处获取内容。</strong></em></p>
<p>您还可以在标记上迭代,这是一个快捷方式。例如</p>
<pre><code>for i in soup.body:
print i
# <p id="firstpara" align="center">This is paragraph <b>one</b>.</p>
# <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>
</code></pre>