与Python 2中的“b”…'.decode（“utf8”，“反斜杠替换”）`完全等价

2条回答

网友

1楼 · 编辑于 2024-09-29 21:41:34

您可以编写自己的错误处理程序。下面是我在Python2.7、3.3和3.6上测试的解决方案：

from __future__ import print_function
import codecs
import sys

print(sys.version)

def myreplace(ex):
    # The error handler receives the UnicodeDecodeError, which contains arguments of the
    # string and start/end indexes of the bad portion.
    bstr,start,end = ex.object,ex.start,ex.end

    # The return value is a tuple of Unicode string and the index to continue conversion.
    # Note: iterating byte strings returns int on 3.x but str on 2.x
    return u''.join('\\x{:02x}'.format(c if isinstance(c,int) else ord(c))
                    for c in bstr[start:end]),end

codecs.register_error('myreplace',myreplace)
print(b'\xc2\xa1\xa1ABC'.decode("utf-8", "myreplace"))

输出：

^{bq}$

网友

2楼 · 编辑于 2024-09-29 21:41:34

我尝试了一个更完整的后端口cpython implementation

这将同时处理UnicodeDecodeError（来自.decode()）以及来自.encode()和来自.translate()的{}：

from __future__ import unicode_literals

import codecs


def _bytes_repr(c):
    """py2: bytes, py3: int"""
    if not isinstance(c, int):
        c = ord(c)
    return '\\x{:x}'.format(c)


def _text_repr(c):
    d = ord(c)
    if d >= 0x10000:
        return '\\U{:08x}'.format(d)
    else:
        return '\\u{:04x}'.format(d)


def backslashescape_backport(ex):
    s, start, end = ex.object, ex.start, ex.end
    c_repr = _bytes_repr if isinstance(ex, UnicodeDecodeError) else _text_repr
    return ''.join(c_repr(c) for c in s[start:end]), end


codecs.register_error('backslashescape_backport', backslashescape_backport)

print(b'\xc2\xa1\xa1after'.decode('utf-8', 'backslashescape_backport'))
print(u'\u2603'.encode('latin1', 'backslashescape_backport'))

相关问题更多 >

编程相关推荐

热门问题

热门文章

与Python 2中的“b”…'.decode（“utf8”，“反斜杠替换”）`完全等价

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >