<p>我有一个包含表格,文本和一些图片的PDF文件。我想提取表格,无论表格在哪里,在PDF中。</p>
<p>现在我正在手动从页面中查找表。从那里,我捕获该页并保存到另一个PDF文件。</p>
<pre><code>import PyPDF2
PDFfilename = "Sammamish.pdf" #filename of your PDF/directory where your PDF is stored
pfr = PyPDF2.PdfFileReader(open(PDFfilename, "rb")) #PdfFileReader object
pg4 = pfr.getPage(126) #extract pg 127
writer = PyPDF2.PdfFileWriter() #create PdfFileWriter object
#add pages
writer.addPage(pg4)
NewPDFfilename = "allTables.pdf" #filename of your PDF/directory where you want your new PDF to be
with open(NewPDFfilename, "wb") as outputStream:
writer.write(outputStream) #write pages to new PDF
</code></pre>
<p>我的目标是从整个PDF文档中提取表。</p>
<p><strong><a href="https://i.stack.imgur.com/0kWSg.png" rel="noreferrer"><img src="https://i.stack.imgur.com/0kWSg.png" alt="Please have a look at the sample image of a page in PDF"/></a></strong></p>