如何使用python pdfminer将pdf转换为HTML？

from pdfminer.pdfinterp import PDFResourceManager from pdfminer.pdfpage import PDFPage from pdfminer.converter import HTMLConverter, TextConverter from pdfminer.layout import LAParams import os import contextlib import tempfile rsrcmgr = PDFResourceManager() laparams = LAParams() converter = HTMLConverter if format == 'html' else TextConverter out_file = "A:\folder" in_file = "A:\folder\pyhtml.html" pdf_filename = 'insurance.pdf' device = converter(rsrcmgr, out_file, codec='utf-8', laparams=laparams) PDFPage.get_pages(rsrcmgr, device, in_file, pagenos=[1], maxpages=1) with contextlib.closing(tempfile.NamedTemporaryFile(mode='r', suffix='.xml')) as xmlin: cmd = 'pdftohtml -xml -nodrm -zoom 1.5 -enc UTF-8 -noframes "%s" "%s"' % ( pdf_filename, xmlin.name.rpartition('.')[0]) os.system(cmd + " >/dev/null 2>&1") result = xmlin.read().decode('utf-8')

Traceback (most recent call last): File "a:\folder\new - Copy.py", line 14, in <module> device = converter(rsrcmgr, out_file, codec='utf-8', laparams=laparams) AttributeError: 'str' object has no attribute 'write'

1条回答

网友

1楼 · 发布于 2024-10-02 18:23:39

AttributeError: 'str' object has no attribute 'write'

如果有.write的尝试，这意味着您应该提供可写文件句柄而不是str，那么您可以使用with open。。。它将为您关闭文件，如下所示，替换

in_file = "A:\folder\pyhtml.html"
device = converter(rsrcmgr, out_file, codec='utf-8', laparams=laparams)

使用

in_file = "A:\folder\pyhtml.html"
with open(in_file, "w") as out_file:
    device = converter(rsrcmgr, out_file, codec='utf-8', laparams=laparams)

如果您想了解更多关于open的信息，请阅读Built-in Functions docs

相关问题更多 >

编程相关推荐

热门问题

热门文章