擅长:python、mysql、java
<p>使用<code>subprocess</code>从xpdf工具调用<code>pdftotext</code>程序。您可以在<a href="https://www.xpdfreader.com/download.html" rel="nofollow noreferrer">https://www.xpdfreader.com/download.html</a>找到这些工具的ms windows版本。获取“Xpdf命令行工具”</p>
<p>我这样使用它(python 3.7):</p>
<pre><code>import subprocess as sp
def pdftotext(path):
"""
Generate a text rendering of a PDF file in the form of a list of lines.
"""
args = ['pdftotext', '-layout', path, '-']
cp = sp.run(
args, stdout=sp.PIPE, stderr=sp.DEVNULL,
check=True, text=True
)
return cp.stdout
</code></pre>