无法使用PyPDF2从PDF文件中获取文本

2条回答

网友

1楼 · 编辑于 2024-09-29 21:49:31

以下内容摘自文档（https://pythonhosted.org/PyPDF2/PageObject.html）

extractText() Locate all text drawing commands, in the order they are provided in the content stream, and extract the text. This works well for some PDF files, but poorly for others, depending on the generator used. This will be refined in the future. Do not rely on the order of text coming out of this function, as it will change if this function is made more sophisticated. Returns: a unicode string object.

因此，这个函数的性能似乎取决于pdf本身。

网友

2楼 · 编辑于 2024-09-29 21:49:31

在win7，python3.6下，我遇到了PyPDF2没有正确编码一些PDF文件的问题。我的解决办法是六号乘客.

pip install pdfminer.six

要从PDF中提取文本，可以使用类似于本文中的函数：https://stackoverflow.com/a/42154976/9524424

很适合我。。。

编程相关推荐

javascript{“错误”：[“无效图像URL”]}与鸟舍集成
mysql Java语句。executeUpdate（sql）在executeQuery（sql）工作时不工作
在java中反复编辑object/arrayList
java在创建子类实例时是否也创建了超类实例？
如果运行一定次数，java是否仍要生成else？
java gradle eclipse依赖项，跳过testCompile/ProviderRuntime等
java如何用Dozer实例化子类？
java如何在docker容器中高效地构建maven项目？
lambda我想在这个块中转换成java 8流？
java本地广播管理器使用主活动未接收到的警报

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法使用PyPDF2从PDF文件中获取文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >