擅长:python、mysql、java
<p>好的,我终于想出了解决办法。这很简单-可以像迭代<code>LTTextBox</code>对象一样迭代<code>LTFigure</code>对象</p>
<pre class="lang-py prettyprint-override"><code>interpreter.process_page(page)
layout = device.get_result()
for lobj in layout:
if isinstance(lobj, LTTextBox):
for element in lobj:
if isinstance(element, LTTextLine):
text = element.get_text()
print(text)
elif isinstance(lobj, LTFigure):
for element in figure:
if isinstance(element, LTChar):
text = element.get_text()
print(text)
</code></pre>
<p>请注意,正确的方法(确保解析器读取文档中的所有内容)是递归地迭代<code>pdfminer</code>对象,如下所示:<a href="https://stackoverflow.com/questions/25248140/how-does-one-obtain-the-location-of-text-in-a-pdf-with-pdfminer">How does one obtain the location of text in a PDF with PDFMiner?</a></p>