美丽汤铵内部

网友

1楼 · 编辑于 2024-10-03 06:26:53

如果只需要文本（不需要HTML标记），则可以使用.text：

soup.select("div").text

网友

2楼 · 编辑于 2024-10-03 06:26:53

TL；博士

对于BeautifulSoup 4，如果希望使用UTF-8编码的testring，请使用element.encode_contents()；如果希望使用Python Unicode字符串，请使用element.decode_contents()。例如，DOM's innerHTML method可能看起来像这样：

def innerHTML(element):
    """Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
    return element.encode_contents()

这些函数目前不在联机文档中，因此我将引用代码中的当前函数定义和文档字符串。

`encode_contents`-从4.0.4开始

def encode_contents(
    self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
    formatter="minimal"):
    """Renders the contents of this tag as a bytestring.

    :param indent_level: Each line of the rendering will be
       indented this many spaces.

    :param encoding: The bytestring will be in this encoding.

    :param formatter: The output formatter responsible for converting
       entities to Unicode characters.
    """

另请参见documentation on formatters；您很可能使用formatter="minimal"（默认）或formatter="html"（对于html entities），除非您希望以某种方式手动处理文本。

encode_contents返回已编码的bytestring。如果需要Python Unicode字符串，请改用decode_contents。

`decode_contents`-从4.0.1开始

decode_contents与encode_contents执行相同的操作，但返回的是Python Unicode字符串，而不是经过编码的bytestring。

def decode_contents(self, indent_level=None,
                   eventual_encoding=DEFAULT_OUTPUT_ENCODING,
                   formatter="minimal"):
    """Renders the contents of this tag as a Unicode string.

    :param indent_level: Each line of the rendering will be
       indented this many spaces.

    :param eventual_encoding: The tag is destined to be
       encoded into this encoding. This method is _not_
       responsible for performing that encoding. This information
       is passed in so that it can be substituted in if the
       document contains a <META> tag that mentions the document's
       encoding.

    :param formatter: The output formatter responsible for converting
       entities to Unicode characters.
    """

美化组3

BeautifulSoup 3没有上述功能，而是有renderContents

def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
                   prettyPrint=False, indentLevel=0):
    """Renders the contents of this tag as a string in the given
    encoding. If encoding is None, returns a Unicode string.."""

为了与BS3兼容，这个函数被添加回BeautifulSoup 4（in 4.0.4）。

网友

3楼 · 编辑于 2024-10-03 06:26:53

其中一个选择是使用类似的方法：

 innerhtml = "".join([str(x) for x in div_element.contents])

TL；博士

`encode_contents`-从4.0.4开始

`decode_contents`-从4.0.1开始

美化组3

相关问题更多 >

编程相关推荐

热门问题

热门文章