如何确定字符串是否被unicode转义

# -*- coding: utf-8 -*- str_escaped = '"A\u0026B"' str_unicode = '"Война́ и миръ"' arr_all_strings = [str_escaped, str_unicode] def is_escaped_unicode(str): #how do I determine if this is escaped unicode? pass for str in arr_all_strings: if is_escaped_unicode(str): str = str.decode("unicode-escape") print str

3条回答

网友

1楼 · 编辑于 2024-10-03 23:25:24

这里有一个粗糙的方法。尝试解码为unicode转义，如果成功，则结果字符串将比原始字符串短。在

str_escaped = '"A\u0026B"'
str_unicode = '"Война́ и миръ"'
arr_all_strings = [str_escaped, str_unicode]

def decoder(s):
    y = s.decode('unicode-escape')
    return y if len(y) < len(s) else s.decode('utf8')

for s in arr_all_strings:
    print s, decoder(s)

输出

^{pr2}$

但是说真的，如果您可以迁移到Python3，您将省去很多痛苦。如果您不能立即迁移到Python3，您可能会发现本文很有帮助：Pragmatic Unicode，这是由经验丰富的Ned Batchelder编写的。在

网友

2楼 · 编辑于 2024-10-03 23:25:24

你不能。在

无法判断''A\u0026B''最初是否来自编码的文本，或者数据是否只是字节''A\u0026B'，或者我们是否从某个其他编码到达那里。在

How do ... you know whether or not to run .decode("unicode-escape")

你必须知道之前是否有人打电话给text.encode('unicode-escape')。字节本身不能告诉你。在

您当然可以通过查找\u或\u转义序列来猜测，或者只需尝试/排除解码，然后看看会发生什么，但我不建议您沿着这条路线走。在

如果您在应用程序中遇到bytestring，而您还不知道编码是什么，那么您的问题就在其他地方，应该在其他地方解决。在

网友

3楼 · 编辑于 2024-10-03 23:25:24

str_escaped = u'"A\u0026B"'
str_unicode = '"Война́ и миръ"'

arr_all_strings = [str_escaped, str_unicode]

def is_ascii(s):
    return all(ord(c) < 128 for c in s)

def is_escaped_unicode(str):
    #how do I determine if this is escaped unicode?
    if is_ascii(str): # escaped unicode is ascii
        return True
    return False

for str in arr_all_strings:
    if is_escaped_unicode(str):
        str = str.decode("unicode-escape")
    print str

以下代码适用于您的案例。在

解释一下：

str_escaped中的所有字符串都在Ascii范围内。
字符“”中不包含unicode字符串。

相关问题更多 >

编程相关推荐

热门问题

热门文章