使用PdfileRead时不显示文本

2024-09-30 00:31:04 发布

男 | 程序猿一只，喜欢编程写python代码。

假设我想从一个pdf文件中提取文本，比如： https://www.lyxoretf.nl/pdfDocuments/Factsheets/RFACT_FR0010377028_EN_20190131_NLD.pdf?pfdrid_c=false&uid=4cc6aef9-9e75-46d7-9416-65cd7b2b5dd6&download=null

import io
import requests
from PyPDF2 import PdfFileReader

url = 'https://www.lyxoretf.nl/pdfDocuments/Factsheets/RFACT_FR0010377028_EN_20190131_NLD.pdf?pfdrid_c=false&uid=4cc6aef9-9e75-46d7-9416-65cd7b2b5dd6&download=null'

r = requests.get(url)
f = io.BytesIO(r.content)

reader = PdfFileReader(f)
contents = reader.getPage(0).extractText().split('\n')

不幸的是，使用相关链接中提供的代码不会返回文件中的文本

有没有办法从这些类型的文件中提取文本

Tags：文件 https 文本 import pdf www nl en

1条回答

网友

1楼 · 发布于 2024-09-30 00:31:04

import fitz     ## pip install PyMupdf  
path = r'\Factsheets_RFACT_FR0010377028_EN_20190131_NLD.pdf' ## This should be stored somewhere in your system/laptop/computer
text=""
doc = fitz.open(path)
for page in doc:                            
    text+=(page.getText())

使用PdfileRead时不显示文本

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用PdfileRead时不显示文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >