如何在Python中用utf8重新编码mbox文件？

2024-05-20 16:25:59 发布

男 | 程序猿一只，喜欢编程写python代码。

我已经导出了一堆Gmail消息，并希望使用Python解析它们并获得见解。然而，在导出时，我在这些mbox文件中实现了一种奇怪的编码，例如，字符“é”被转换为=E9，引号符号（“and”）被转换为=E2=80=9C和=E2=80=9D。我的电子邮件经常有很多外来脚本，因此将这些文件解码成utf-8对我来说非常重要。此外，我经常会收到带有表情符号的信息，这些信息也传达了我需要保存的重要情感信息

我发现这种编码称为Quoted Printable，我尝试使用quopriPython模块，但是没有成功

以下是我的简化代码：

import os
import quopri
from pathlib import Path

for filename in os.listdir(directory):
    if filename.endswith(".mbox"): 
        input_filename =  Path(os.path.join(directory,filename))
        output_filename = Path(os.path.join(directory,filename+'_utf-8'))

        with open(input_filename, 'rb'):
            quopri.decode(input_filename, output_filename)

但是，当运行此命令时，我在最后一行得到以下错误：AttributeError: 'WindowsPath' object has no attribute 'read'。我不明白为什么会出现这个错误，因为定义的路径指向文件

Tags：文件 path import 信息编码 input os filename

1条回答

网友

1楼 · 发布于 2024-05-20 16:25:59

您需要声明上下文管理器的名称（with语句），如下所示：

with input_filename.open('rb') as infile, output_filename.open('wb') as outfile:
    quopri.decode(infile, outfile)

如何在Python中用utf8重新编码mbox文件？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在Python中用utf8重新编码mbox文件？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >