在数据中规范化unicode

1条回答

网友

1楼 · 发布于 2024-10-04 05:25:51

文件的内容是六个字符：\u0029。'\u0029'键入的代码是表示为转义码的单个Unicode代码点：

>>> print('\u00e9')   # A single character escape code
é
>>> print(r'\u0039')  # A six-character string using raw string notation.
\u0039                # Escape codes are ignored and characters are literal.
>>> print('\\u0039')  # A six-character string using an escaped backslash
\u0039                # to indicate a literal backslash.

要将六个字符串转换为一个字符，请使用以下命令：

>>> r'\u00e9'.encode('ascii').decode('unicode-escape')
'é'

将ASCII字符的Unicode字符串转换为字节字符串需要ascii编码，因为在Python3中只能对字节字符串进行解码。python2可以跳过它，因为如果需要，它会隐式地将Unicode字符串编码回ASCII。
您还可以直接从文件中读取它（假设为Python 3），方法是：

with open('unicode.txt',encoding='unicode-escape') as f:
    data = f.read()

在python2上使用import io和io.open。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

在数据中规范化unicode

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >