擅长:python、mysql、java
<ul>
<li>我建议您使用tabla提取表格</李>
<li>将pdf作为参数传递给table api,它将以dataframe的形式返回表</李>
<li>pdf中的每个表都作为一个数据帧返回</李>
<li>该表将在dataframea列表中返回,用于处理所需的dataframe</李>
</ul>
<p>这是我提取pdf的代码</p>
<pre><code>import pandas as pd
import tabula
file = "filename.pdf"
path = 'enter your directory path here' + file
df = tabula.read_pdf(path, pages = '1', multiple_tables = True)
print(df)
</code></pre>
<p>有关更多详情,请参阅我的<a href="https://github.com/masterhimanshupoddar/extracting-multiple-tables-from-pdf-using-Tabula/blob/master/pdf-table-extractor.py" rel="noreferrer">repo</a></p>