如何在Python 3.1中取消转义字符串中的HTML实体？

网友

1楼 · 编辑于 2024-05-11 21:07:26

您可以使用函数html.unescape：

在Python3.4+中（感谢J.F.Sebastian的更新）：

import html
html.unescape('Suzy &amp; John')
# 'Suzy & John'

html.unescape('&quot;')
# '"'

在Python3.3或更早的时候：

import html.parser    
html.parser.HTMLParser().unescape('Suzy &amp; John')

在Python2中：

import HTMLParser
HTMLParser.HTMLParser().unescape('Suzy &amp; John')

网友

2楼 · 编辑于 2024-05-11 21:07:26

您可以为此目的使用^{}。该模块包含在Python标准库中，可在Python 2.x和Python 3.x之间移植

>>> import xml.sax.saxutils as saxutils
>>> saxutils.unescape("Suzy &amp; John")
'Suzy & John'

网友

3楼 · 编辑于 2024-05-11 21:07:26

显然，我没有足够高的声誉做任何事，除了张贴这个。联合国大学的答复并没有改变引文。我唯一发现的就是这个功能：

import re
from htmlentitydefs import name2codepoint as n2cp

def decodeHtmlentities(string):
    def substitute_entity(match):        
        ent = match.group(2)
        if match.group(1) == "#":
            return unichr(int(ent))
        else:
            cp = n2cp.get(ent)
            if cp:
                return unichr(cp)
            else:
                return match.group()
    entity_re = re.compile("&(#?)(\d{1,5}|\w{1,8});")
    return entity_re.subn(substitute_entity, string)[0]

我从这里得到的。

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在Python 3.1中取消转义字符串中的HTML实体？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >