擅长:python、mysql、java
<p>我尝试了一个更完整的后端口<a href="https://github.com/python/cpython/blob/d6e2f26f3f7c62a4ddbf668027d3ba27cb0e1eca/Python/codecs.c#L859" rel="noreferrer">cpython implementation</a></p>
<p>这将同时处理<code>UnicodeDecodeError</code>(来自<code>.decode()</code>)以及来自<code>.encode()</code>和来自<code>.translate()</code>的{<cd3>}:</p>
<pre><code>from __future__ import unicode_literals
import codecs
def _bytes_repr(c):
"""py2: bytes, py3: int"""
if not isinstance(c, int):
c = ord(c)
return '\\x{:x}'.format(c)
def _text_repr(c):
d = ord(c)
if d >= 0x10000:
return '\\U{:08x}'.format(d)
else:
return '\\u{:04x}'.format(d)
def backslashescape_backport(ex):
s, start, end = ex.object, ex.start, ex.end
c_repr = _bytes_repr if isinstance(ex, UnicodeDecodeError) else _text_repr
return ''.join(c_repr(c) for c in s[start:end]), end
codecs.register_error('backslashescape_backport', backslashescape_backport)
print(b'\xc2\xa1\xa1after'.decode('utf-8', 'backslashescape_backport'))
print(u'\u2603'.encode('latin1', 'backslashescape_backport'))
</code></pre>