擅长:python、mysql、java
<p>我建议你用表格把这张桌子取出来。将pdf作为参数传递给tablaapi,它将以dataframe的形式返回表。pdf中的每个表都作为一个数据帧返回。
这是我提取pdf的代码。</p>
<pre><code>#the table will be returned in a list of dataframe,for working with dataframe you need pandas
import pandas as pd
import tabula
file = "filename.pdf"
path = 'enter your directory path here' + file
df = tabula.read_pdf(path, pages = '1', multiple_tables = True)
print(df)
</code></pre>
<p>请参阅我的<a href="https://github.com/masterhimanshupoddar/extracting-multiple-tables-from-pdf-using-Tabula/blob/master/pdf-table-extractor.py" rel="nofollow noreferrer">repo</a>了解更多详细信息。</p>