UnicodeDecodeError输入json.dump文件

Python 2.7.5+ (default, Sep 19 2013, 13:48:49) [GCC 4.8.1] on linux2 >>> import json >>> json.dumps(['\xd0\xb2', u'\xd0\xb2'], ensure_ascii=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps sort_keys=sort_keys, **kw).encode(obj) File "/usr/lib/python2.7/json/encoder.py", line 210, in encode return ''.join(chunks) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128) >>>

1条回答

网友

1楼 · 发布于 2024-09-25 12:37:49

AFAIK序列化为JSON格式的原因是为了存储或传输一些信息。如果指定ensure_ascii = False，则不会对非ascii字符进行编码，这一点也没有意义，因为您希望对数据进行编码和序列化。在

基本上，您试图获得一个非编码字符的字符串，这是不可能的。在

官方文件：

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is False, some chunks written to fp may be unicode instances. This usually happens because the input contains unicode strings or the encoding parameter is used. Unless fp.write() explicitly understands unicode (as in codecs.getwriter()) this is likely to cause an error.

另一方面，您正在设计一个API并不能说明您不能控制输入。API在某种程度上就是一个契约：如果给定了一些输入，则返回一些输出。因此，您可以而且应该始终指定您期望的内容。在

在您的例子中，可以逐个检查元素，并将bytestrings转换为unicode。尽管如此，我的建议是，强制用户使用unicode，不要指定ensure_ascii = False

对我来说，理解编码和避免问题的一般规则是：

代码中的字符串必须是unicode。在
导入数据时，对其进行解码以使其为unicode。导出时，编码。这需要双方都同意他们使用的编码，否则你只会得到噪音。在

相关问题更多 >

编程相关推荐

热门问题

热门文章