擅长:python、mysql、java
<p>因此,关于PDF抓取,我提出了以下建议:</p>
<pre><code>import requests
import io
import PyPDF2
# Donwload PDF
URL = 'https://www.federalreserve.gov/monetarypolicy/files/monetary20200129a1.pdf'
pdf_bytes = requests.get(URL).content
# PDF Reader expects a file-like object
pdf_stream = io.BytesIO(pdf)
reader = PyPDF2.PdfFileReader(pdf_stream)
# Read the first page
page = reader.getPage(0)
page_content = page.extractText()
print(page_content.encode('utf-8'))
</code></pre>
<p>此外,它可能值得一看<a href="https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file">How to extract text from a PDF file?</a></p>