Python2.7检查文件是否用UTF8编码

2024-06-24 13:48:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前的解决方案只是读取一个文件的所有字节,尝试解码,如果有任何异常,我会说这个文件没有正确编码。还有其他更优雅的方式吗?谢谢。在

utfbytes.decode('utf-8')

谨致问候, 林


Tags: 文件编码字节方式解决方案解码utfdecode
1条回答
网友
1楼 · 发布于 2024-06-24 13:48:32

No。从这个答案来看:

Correctly detecting the encoding all times is impossible.

(From chardet FAQ:)

However, some encodings are optimized for specific languages, and languages are not random. Some character sequences pop up all the time, while other sequences make no sense. A person fluent in English who opens a newspaper and finds “txzqJv 2!dasd0a QqdKjvz” will instantly recognize that that isn't English (even though it is composed entirely of English letters). By studying lots of “typical” text, a computer algorithm can simulate this kind of fluency and make an educated guess about a text's language.

但是,存在的some libraries确实尽了最大努力尝试找到编码类型。在

相关问题 更多 >