Python无法使用surrogatescap编码

2024-09-28 19:05:48 发布

男 | 程序猿一只，喜欢编程写python代码。

我在Python（3.4）中使用Unicode代理项编码时遇到问题：

>>> b'\xCC'.decode('utf-16_be', 'surrogateescape').encode('utf-16_be', 'surrogateescape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-16-be' codec can't encode character '\udccc' in position 0: surrogates not allowed

如果我没有弄错，根据Python documentation：

'surrogateescape': On decoding, replace byte with individual surrogate code ranging from U+DC80 to U+DCFF. This code will then be turned back into the same byte when the 'surrogateescape' error handler is used when encoding the data.

代码应该只生成源序列（b'\xCC'）。那么为什么会引发异常呢？

这可能与我的第二个问题有关：

Changed in version 3.4: The utf-16* and utf-32* encoders no longer allow surrogate code points (U+D800–U+DFFF) to be encoded.

（来自https://docs.python.org/3/library/codecs.html#standard-encodings）

据我所知，如果没有代理项对，就不可能将一些代码点编码到UTF-16。这背后的原因是什么？

Tags： the to 代码 in 代理编码 code be

0条回答

目前没有回答

Python无法使用surrogatescap编码

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python无法使用surrogatescap编码

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >