比较python2.7中用户输入的unicode字符串

type árbol: árbol Encoding utf-8 User Input Program Input -------------------------------------------------- Ã¡rbol árbol (raw value) Ã¡rbol árbol (unicode(value)) Ã¡rbol árbol (value.decode('utf8')) Ã¡rbol árbol (normalize('NFC',value)) User Input Program Input (Repr) -------------------------------------------------- '\xc3\x83\xc2\xa1rbol' u'\xe1rbol' u'\xc3\xa1rbol' u'\xe1rbol' (unicode(value)) u'\xc3\xa1rbol' u'\xe1rbol' (value.decode('utf8')) u'\xc3\xa1rbol' u'\xe1rbol' (normalize('NFC',value)))

2条回答

网友

1楼 · 编辑于 2024-09-30 08:25:38

您当前的方法还不错，但您可能应该使用^{}进行比较。上面链接的文档解释了为什么这是个好主意。例如，尝试评估以下内容：

u'Ç' == u'Ç'

扰流板警报，这将给您False，因为左侧是序列U+0043（拉丁文大写字母C）U+0327（组合加符），右侧是单个字符U+00C7（拉丁文大写字母C加上加符）。在

您可以使用unicodedata.normalize()正确地处理这个问题，方法是首先将字符串转换为规范化的形式。例如：

^{pr2}$

网友

2楼 · 编辑于 2024-09-30 08:25:38

你能检查一下终端的字符编码吗

import sys
sys.stdin.encoding

如果是UTF-8，那么解码就可以了。否则，您必须用正确的编码对原始输入进行解码。在

比如，raw_input（）。解码(系统标准编码)如果需要，检查它是否与Unicode规范化一起正确。在

相关问题更多 >

编程相关推荐

热门问题

热门文章