<h2>TL;博士</h2>
<p>对于BeautifulSoup 4,如果希望使用UTF-8编码的testring,请使用<code>element.encode_contents()</code>;如果希望使用Python Unicode字符串,请使用<code>element.decode_contents()</code>。例如,<a href="http://domparsing.spec.whatwg.org/#innerhtml" rel="noreferrer">DOM's innerHTML method</a>可能看起来像这样:</p>
<pre class="lang-py prettyprint-override"><code>def innerHTML(element):
"""Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
return element.encode_contents()
</code></pre>
<hr/>
<p>这些函数目前不在联机文档中,因此我将引用代码中的当前函数定义和文档字符串。</p>
<h2><code>encode_contents</code>-从4.0.4开始</h2>
<pre class="lang-py prettyprint-override"><code>def encode_contents(
self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a bytestring.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param encoding: The bytestring will be in this encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
</code></pre>
<p>另请参见<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters" rel="noreferrer">documentation on formatters</a>;您很可能使用<code>formatter="minimal"</code>(默认)或<code>formatter="html"</code>(对于<a href="https://developer.mozilla.org/en-US/docs/Glossary/Entity" rel="noreferrer">html entities</a>),除非您希望以某种方式手动处理文本。</p>
<p><code>encode_contents</code>返回已编码的bytestring。如果需要Python Unicode字符串,请改用<code>decode_contents</code>。</p>
<hr/>
<h2><code>decode_contents</code>-从4.0.1开始</h2>
<p><code>decode_contents</code>与<code>encode_contents</code>执行相同的操作,但返回的是Python Unicode字符串,而不是经过编码的bytestring。</p>
<pre class="lang-py prettyprint-override"><code>def decode_contents(self, indent_level=None,
eventual_encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a Unicode string.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param eventual_encoding: The tag is destined to be
encoded into this encoding. This method is _not_
responsible for performing that encoding. This information
is passed in so that it can be substituted in if the
document contains a <META> tag that mentions the document's
encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
</code></pre>
<hr/>
<h2>美化组3</h2>
<p>BeautifulSoup 3没有上述功能,而是有<code>renderContents</code></p>
<pre class="lang-py prettyprint-override"><code>def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
prettyPrint=False, indentLevel=0):
"""Renders the contents of this tag as a string in the given
encoding. If encoding is None, returns a Unicode string.."""
</code></pre>
<p>为了与BS3兼容,这个函数被添加回BeautifulSoup 4(<a href="https://bazaar.launchpad.net/~arthur-darcet/beautifulsoup/beautifulsoup/revision/206#bs4/element.py" rel="noreferrer">in 4.0.4</a>)。</p>