泰瑟罗克不认识特克斯

from PIL import Image from tesserocr import PyTessBaseAPI, RIL image = Image.open('test3.png') with PyTessBaseAPI() as api: api.SetImage(image) boxes = api.GetComponentImages(RIL.TEXTLINE, True) print 'Found {} textline image components.'.format(len(boxes)) for i, (im, box, _, _) in enumerate(boxes): api.SetRectangle(box['x'], box['y'], box['w'], box['h']) ocrResult = api.GetUTF8Text() conf = api.MeanTextConf() result = (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, " "confidence: {1}, text: {2}").format(i, conf, ocrResult, **box) print result

2条回答

网友

1楼 · 编辑于 2024-06-06 17:04:03

下面的代码有正确的OCR结果，但没有x，y，w，h和置信度信息。在

import tesserocr
from PIL import Image

print tesserocr.tesseract_version()  # print tesseract-ocr version

image = Image.open('SO_5TextLines.png')

lines = tesserocr.image_to_text(image)  # print ocr text from image
for line in lines.split("\r"):
    print line

输出：

^{pr2}$

在OSX Sierra中运行您的代码，结果与第4行丢失的结果相同。问题似乎是由api.SetRectangle()引起的，您可以将代码修改为print boxes，以便进一步检查。示例代码只是基于您提供的示例文本图像，它需要使用更多图像进行测试，以验证它是否适合所有人。在

希望这对你有用。在

网友

2楼 · 编辑于 2024-06-06 17:04:03

使用默认的Tesseract 4.00.00alpha和oem 3模式可以正确识别它。结果如下。在

如果您仍在使用v3.x，建议使用您的tesserocr将tesseract升级到{}。在

EDIT:
To upgrade tesserocr to support v4.00.00.alpha, check this "Is any plan to porting tesseract 4.0 (alpha)" issue page. There are guidelines to make it works.

相关问题更多 >

编程相关推荐

热门问题

热门文章