无法从windows1252解码到UTF8

2024-06-28 19:59:03 发布

男 | 程序猿一只，喜欢编程写python代码。

我知道关于utf-8编码和解码的问题已经被问了很多次，但我还是找不到答案。我在windows1252中有一个CSV文件，我想把它放在UTF-8中，脚本如下：

import os
import sys
import inspect
import codecs
import chardet
from bs4 import UnicodeDammit

#Declare the variables
defaultencoding = 'utf-8'
filename = '19-01-2017+06-00-00.csv'


#open the file and get the content
file_obj = open(filename,"r")
content = file_obj.read()
file_obj.close()

#Check the initial encoding using both unicodeDammit and chardet
dammit = UnicodeDammit(content)
#print it
print(dammit.original_encoding)
print(chardet.detect(content)['encoding'])


#Decode in UTF8
content_decoded = content.decode('windows-1252')
content_encoded = content_decoded.encode(defaultencoding)
#Write the result in a temporary file
file_obj = open('tmp.txt',"w")
try:
    file_obj.write(content_encoded)
finally:
    file_obj.close()
#Read the result decoded file
file_obj = open('tmp.txt', "r")
content = file_obj.read()
file_obj.close()

#Check if it is really in UTF8 using both unicodeDammit and chardet
dammit = UnicodeDammit(content)
print(dammit.original_encoding)
print(chardet.detect(content)['encoding'])

Output:

^{pr2}$

预期产量：

windows-1252
windows-1252
utf-8
utf-8

我使用了chardet和{}，因为我发现{}并没有一直给出正确的编码猜测。在

为什么不能用utf-8编码文件？在

Tags： and the import obj 编码 close open content

0条回答

目前没有回答

无法从windows1252解码到UTF8

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法从windows1252解码到UTF8

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >