为什么Python有时会将字符串升级为Unicode，而有时不会？

>>> class A: ... def __str__(self): return "string" ... def __unicode__(self): return "unicode" ... >>> "%s %s" % (u'niño', A()) u'ni\xc3\xb1o unicode' >>> "%s %s" % (A(), u'niño') u'string ni\xc3\xb1o'

1条回答

网友

1楼 · 发布于 2024-09-30 14:37:25

Python Language Reference的答案是：

If format is a Unicode object, or if any of the objects being converted using the %s conversion are Unicode objects, the result will also be a Unicode object.

foo = u'Émilie and Juañ are turncoats.'
bar = "foo is %s" % foo

这是可行的，因为foo是一个unicode对象。这将使上述规则生效并生成Unicode字符串。在

^{pr2}$

在本例中，foo2是一个Exception对象，显然不是unicode对象。因此解释器尝试使用默认编码将其转换为普通的str。显然，这是ascii，它不能表示这些字符，并在异常情况下退出。在

bar = u"foo2 is %s" % foo2

这里它又起作用了，因为格式字符串是一个unicode对象。因此解释器也尝试将foo2转换为unicode对象，这是成功的。在

至于兰德尔的问题：这也让我感到惊讶。然而，根据标准（为可读性而重新格式化）：

%s converts any Python object using str(). If the object or format provided is a unicode string, the resulting string will also be unicode.

这样一个unicode对象是如何创建的还不清楚。所以两者都是合法的：

调用__str__，解码回一个Unicode字符串，并将其插入到输出字符串中
调用__unicode__并将结果直接插入到输出字符串中

Python解释器的混合行为确实相当可怕。我认为这是标准中的一个bug。在

编辑：引用Python 3.0 changelog，强调我的：

Everything you thought you knew about binary data and Unicode has changed.
[...]
As a consequence of this change in philosophy, pretty much all code that uses Unicode, encodings or binary data most likely has to change. The change is for the better, as in the 2.x world there were numerous bugs having to do with mixing encoded and unencoded text.

相关问题更多 >

编程相关推荐

热门问题

热门文章