在Python中解码UTF8 URL

网友

1楼 · 编辑于 2024-10-01 11:22:17

更新：如果输出文件是yaml文档，则可以忽略其中的\u0163。Unicode转义符在yaml文档中有效。在

#!/usr/bin/env python3
import json

# json produces a subset of yaml
print(json.dumps('pe toţi mai')) # -> "pe to\u0163i mai"
print(json.dumps('pe toţi mai', ensure_ascii=False)) # -> "pe toţi mai"

注意：最后一个例子中没有\u。这两行表示相同的Python字符串。在

yaml.dump()有类似的选项：allow_unicode。将其设置为True，以避免Unicode转义。在

url正确。你不需要做任何事情：

^{pr2}$

\u0163序列可能由字符编码错误处理程序引入：

with open('some_other_file', 'wb') as file: # write bytes
    file.write(text.encode('ascii', 'backslashreplace')) # -> pe to\u0163i mai

或者：

with open('another', 'w', encoding='ascii', errors='backslashreplace') as file:
    file.write(text) # -> pe to\u0163i mai

更多示例：

# introduce some more \u escapes
b = r"pe to\u0163i mai ţţţ".encode('ascii', 'backslashreplace') # bytes
print(b.decode('ascii')) # -> pe to\u0163i mai \u0163\u0163\u0163
# remove unicode escapes
print(b.decode('unicode-escape')) # -> pe toţi mai ţţţ

网友

2楼 · 编辑于 2024-10-01 11:22:17

使用unicode_escape尝试decode。在

例如：

>>> print "pe to\u0163i mai".decode('unicode_escape')
pe toţi mai

网友

3楼 · 编辑于 2024-10-01 11:22:17

Python 3

调用urllib.parse.unquote已返回Unicode字符串：

>>> urllib.parse.unquote("pe%20to%C5%A3i%20mai")
'pe toţi mai'

如果没有得到这个结果，那一定是代码中的错误。请张贴您的代码。在

Python 2

使用decode从bytestring获取Unicode字符串：

^{pr2}$

请记住，当您将Unicode字符串写入文件时，必须再次对其进行编码。您可以选择以UTF-8的形式写入文件，但如果需要，也可以选择不同的编码方式。您还必须记住在从文件读回时使用相同的编码。您可能会发现codecs模块对于在读写文件时指定编码很有用。在

>>> import urllib2, codecs
>>> s = urllib2.unquote("pe%20to%C5%A3i%20mai").decode('utf-8')

>>> # Write the string to a file.
>>> with codecs.open('test.txt', 'w', 'utf-8') as f:
...     f.write(s)

>>> # Read the string back from the file.
>>> with codecs.open('test.txt', 'r', 'utf-8') as f:
...     s2 = f.read()

一个可能令人困惑的问题是，在交互式解释器中，Unicode字符串有时使用\uxxxx符号而不是实际字符来显示：

>>> s
u'pe to\u0163i mai'
>>> print s
pe toţi mai

这并不意味着字符串是“错误的”。这就是翻译的工作方式。在

相关问题更多 >

编程相关推荐

热门问题

热门文章