我得到subprocess.CalledProcessError:命令'['java','-Dfile.encoding=UTF8',运行Table python库时出错
命令:
df = tabula.read_pdf(filepath, pages = 5 ,guess=True, multiple_tables= True, stream=True, java_options="-Dfile.encoding=UTF8")
错误消息:
File "C:\Users\himsoni\AppData\Local\Programs\Python\Python37\lib\site-packages\tabula\io.py", line 85, in _run
check=True,
File "C:\Users\himsoni\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 487, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\himsoni\\AppData\\Local\\Programs\\Python\\Python37\\lib\\site-packages\\tabula\\tabula-1.0.3-jar-with-dependencies.jar', '--pages', '1', '--stream', '--guess', '--format', 'JSON', 'C:\\Users\\himsoni\\Desktop\\PDF_extraction\\black_white_format\\black_white_format\\PDF_Split_JPEGs\\blackwhite_Test.pdf']' returned non-zero exit status 1.
导入tabla;tabla.environment_info()
Python version:
3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)]
Java version:
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) Client VM (build 25.231-b11, mixed mode, sharing)
tabula-py version: 2.0.1
platform: Windows-10-10.0.17763-SP0
uname:
uname_result(system='Windows', node='himsoni', release='10', version='10.0.17763', machine='AMD64', processor='Intel64 Family 6 Model 142 Stepping 10, GenuineIntel')
linux_distribution: ('', '', '')
mac_ver: ('', ('', '', ''), '')
Python和Java版本
Python 3.7.4
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) Client VM (build 25.231-b11, mixed mode)
Does java -h command work well?; Yes
Ensure your java command is included in PATH Yes
Write your OS and it's version: ? Windows 10
代码:
import tabula
filepath = "C:\\Users\\himsoni\\Desktop\PDF_extraction\\black_white_format\\black_white_format\\PDF_Split_JPEGs\\blackwhite.pdf"
df = tabula.read_pdf(filepath, pages = 5 ,guess=True, multiple_tables= True, stream=True, java_options="-Dfile.encoding=UTF8")
print(df)
预期输出:获取特定页面的表格put
my PDF包含以下字体描述符对象:
根据PDF规范,斜体字符必须是数字-17.-21823不是有效的数字表示形式。因此,不在引擎盖下进行修复的PDF解析器很可能无法读取您的文件。PDFBox确实失败了
PS:答案由TABLA pdf/TABLA java开发团队提供
相关问题 更多 >
编程相关推荐