什么编码看起来和ASCII完全一样，但每个字节前都有空字节？

s = u'\x00Q\x00u\x00i\x00c\x00k' >>> print s Quick >>> >>> s == 'Quick' False >>> >>> import re >>> re.search('Quick', s) >>> >>> import chardet >>> chardet.detect(s) /usr/lib/pymodules/python2.6/chardet/universaldetector.py:69: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if aBuf[:3] == '\xEF\xBB\xBF': /usr/lib/pymodules/python2.6/chardet/universaldetector.py:72: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif aBuf[:4] == '\xFF\xFE\x00\x00': /usr/lib/pymodules/python2.6/chardet/universaldetector.py:75: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif aBuf[:4] == '\x00\x00\xFE\xFF': /usr/lib/pymodules/python2.6/chardet/universaldetector.py:78: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif aBuf[:4] == '\xFE\xFF\x00\x00': /usr/lib/pymodules/python2.6/chardet/universaldetector.py:81: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif aBuf[:4] == '\x00\x00\xFF\xFE': /usr/lib/pymodules/python2.6/chardet/universaldetector.py:84: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif aBuf[:2] == '\xFF\xFE': /usr/lib/pymodules/python2.6/chardet/universaldetector.py:87: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal elif aBuf[:2] == '\xFE\xFF': {'confidence': 1.0, 'encoding': 'ascii'} >>> >>> chardet.detect(s) {'confidence': 1.0, 'encoding': 'ascii'} >>>

2条回答

网友

1楼 · 编辑于 2024-10-01 13:34:44

你的UTF-16BE没有物料清单。如文件所述，chardet不会在没有BOM的情况下摸索UTF nnxE。在

>>> s = '\x00Q\x00u\x00i\x00c\x00k' #### Note: dropping the spurious `u` prefix
>>> s.decode('utf_16be')
u'Quick'
>>>

chardet也不够聪明，如果输入unicode:-）则无法引发DontBeSilly异常

网友

2楼 · 编辑于 2024-10-01 13:34:44

UTF-16大端

相关问题更多 >

编程相关推荐

热门问题

热门文章