我想从OCR D中提取表信息 - 问答 - Python中文网

我想从OCR D中提取表信息

2024-09-30 12:29:25 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我想从OCR数据中提取表信息，我有原始文本和它的文本。我尝试了pytesseract，但找不到实际的实现。在

这是一张图片：https://drive.google.com/open?id=1CGJwbmf5snoXvwlQAsRAxIRRixbT_Q8l

我试过了：https://github.com/WZBSocialScienceCenter/pdftabextract

这种方法对我一点也不管用。在

我想从OCR数据中得到这个表的表格结构，以便进一步处理。在

Tags：数据 https 文本 github com 信息 id google

1条回答

网友

1楼 · 发布于 2024-09-30 12:29:25

pdftabextract is not an OCR. It requires scanned pages with OCR information, i.e. a "sandwich PDF" that contains both the scanned images and the recognized text. You need software like tesseract or ABBYY Finereader for OCR.

请尝试tesseract，它的实现相对容易一些。在

相关问题更多 >

编程相关推荐

热门问题

热门文章