使用pypdf2读取pdf元数据

2024-10-03 17:23:06 发布

您现在位置:Python中文网/ 问答频道 /正文

在提取pdf元数据时,我得到的响应是

indirectObject
{'/Title': IndirectObject(38, 0), '/Author': IndirectObject(40, 0), '/Subject': IndirectObject(41, 0), '/Producer': IndirectObject(39, 0), '/Creator': IndirectObject(42, 0), '/CreationDate': IndirectObject(43, 0), '/ModDate': IndirectObject(43, 0)}

我试过pypdf2,和pdfminer.six

with open(path, 'rb') as f:
    pdf = PdfFileReader(f)
    info = pdf.getDocumentInfo()

获取响应:

{'/Title': IndirectObject(38, 0), '/Author': IndirectObject(40, 0), '/Subject': IndirectObject(41, 0), '/Producer': IndirectObject(39, 0), '/Creator': IndirectObject(42, 0), '/CreationDate': IndirectObject(43, 0), '/ModDate': IndirectObject(43, 0)}

所以尝试了pdfrw它成功了

from pdfrw import PdfReader
>>> PdfReader(<filename>).Info

Tags: producer数据pdftitlepdfminerauthorsubjectcreator