Unicode（UTF-8）在Python中读写文件

3条回答

网友

1楼 · 编辑于 2024-06-28 14:39:47

现在你只需要在Python3里open(Filename, 'r', encoding='utf-8')

【2016年2月10日编辑，请求澄清】

Python3在其open函数中添加了encoding参数。下面是关于open函数的信息：https://docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=-1, 
      encoding=None, errors=None, newline=None, 
      closefd=True, opener=None)

Encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

因此，通过将encoding='utf-8'作为参数添加到open函数中，文件的读写都是以utf8（这也是Python中所有操作的默认编码）的形式完成的

网友

2楼 · 编辑于 2024-06-28 14:39:47

在符号中

u'Capit\xe1n\n'

“\xe1”只代表一个字节。”\x”告诉你“e1”是十六进制的。当你写作时

Capit\xc3\xa1n

文件中有“\xc3”。这些是4个字节，在你的代码中，你可以全部读取它们。您可以在显示它们时看到：

>>> open('f2').read()
'Capit\\xc3\\xa1n\n'

你可以看到反斜杠是由反斜杠转义的。所以字符串中有四个字节：“\”、“x”、“c”和“3”。

编辑：

正如其他人在他们的回答中指出的那样，您应该只在编辑器中输入字符，然后您的编辑器应该处理到UTF-8的转换并保存它。

如果您实际拥有这种格式的字符串，则可以使用string_escape编解码器将其解码为普通字符串：

In [15]: print 'Capit\\xc3\\xa1n\n'.decode('string_escape')
Capitán

结果是一个用UTF-8编码的字符串，其中重音字符由原始字符串中写入的两个字节表示。如果你想有一个unicode字符串，你必须用UTF-8再次解码。

编辑：您的文件中没有UTF-8。要真正看到它的样子：

s = u'Capit\xe1n\n'
sutf8 = s.encode('UTF-8')
open('utf-8.out', 'w').write(sutf8)

将文件utf-8.out的内容与用编辑器保存的文件内容进行比较。

网友

3楼 · 编辑于 2024-06-28 14:39:47

我发现在打开文件时更容易指定编码，而不是搞乱编码和解码方法。^{} module（在Python 2.6中添加）提供了一个具有编码参数的io.open函数。

使用io模块中的open方法。

>>>import io
>>>f = io.open("test", mode="r", encoding="utf-8")

然后在调用f的read（）函数之后，返回一个编码的Unicode对象。

>>>f.read()
u'Capit\xe1l\n\n'

注意，在Python 3中，io.open函数是内置open函数的别名。内置的open函数只支持Python 3中的编码参数，而不支持Python 2。

编辑：之前这个答案推荐了codecs模块。因此这个答案现在建议使用io模块。

使用codec模块中的open方法。

>>>import codecs
>>>f = codecs.open("test", "r", "utf-8")

然后在调用f的read（）函数之后，返回一个编码的Unicode对象。

>>>f.read()
u'Capit\xe1l\n\n'

如果您知道文件的编码，那么使用codecs包就不会那么混乱了。

见http://docs.python.org/library/codecs.html#codecs.open

相关问题更多 >

编程相关推荐

热门问题

热门文章