如何更正存储为ASCII的UTF8字符

2条回答

网友

1楼 · 编辑于 2024-09-30 04:33:31

如果文件中有"JosÃ©"，则文件查看器读取/显示的数据不正确。它是UTF-8，但是用错误的编码解码。示例：

import locale

# Correctly written
with open('file.txt','w',encoding='utf8') as f:
    f.write('José')

# The default encoding for open()
print(locale.getpreferredencoding(False))

# Incorrectly opened
with open('file.txt') as f:
    data = f.read()
    print(data)
    # What I think you are requesting as a fix.
    # Re-encode with the incorrect encoding, then decode correctly.
    print(data.encode('cp1252').decode('utf8'))

# Correctly opened
with open('file.txt',encoding='utf8') as f:
    print(f.read())

输出：

cp1252
JosÃ©
José
José

网友

2楼 · 编辑于 2024-09-30 04:33:31

如果您使用的是Python 3，那么可以使用bytes function执行以下操作：

test = "JosÃ©"
fixed = bytes(test, 'iso-8859-1').decode('utf-8')
# fixed will now contain the string José

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何更正存储为ASCII的UTF8字符

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >