<p>这里有一个使用contrib文件夹中ctypes的tesseract C-API的示例。不过,它似乎有点过时了。<a href="https://github.com/tesseract-ocr/tesseract/blob/a3ba11b030345d32829b1e8355afea5419978d82/contrib/tesseract-c_api-demo.py" rel="nofollow" title="contrib/tesseract-c_api-demo.py">contrib/tesseract-c_api-demo.py</a></p>
<p>您需要为一些方法设置<code>restype</code>和<code>argtypes</code>。另外,别忘了在处理程序上调用init函数。下面的例子适用于我。它从一个名为“的文件中读取文本”测试.bmp在英语中输入<code>text</code>变量。在</p>
<pre><code>from ctypes import *
from ctypes.util import find_library
lang = b"eng"
filename = b"test.bmp"
TESSDATA_PREFIX = b"/usr/local/Cellar/tesseract/3.04.01_1/share/tessdata"
path = find_library("libtesseract.dylib")
tesseract = CDLL(path)
class TessBaseAPI(Structure):
pass
class TessResultRenderer(Structure):
pass
tesseract.TessBaseAPICreate.restype = POINTER(TessBaseAPI)
tesseract.TessBaseAPIInit3.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p]
tesseract.TessBaseAPIInit3.restype = c_bool
tesseract.TessBaseAPIProcessPages.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p, c_int, POINTER(TessResultRenderer)]
tesseract.TessBaseAPIProcessPages.restype = c_bool
tesseract.TessBaseAPIGetUTF8Text.argtypes = [POINTER(TessBaseAPI)]
tesseract.TessBaseAPIGetUTF8Text.restype = c_char_p
api = tesseract.TessBaseAPICreate()
rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang);
if (rc):
tesseract.TessBaseAPIDelete(api)
print("Could not initialize tesseract.\n")
exit(3)
success = tesseract.TessBaseAPIProcessPages(api, filename, None , 0, None)
if success:
text = tesseract.TessBaseAPIGetUTF8Text(api)
print("="*78)
print(text.decode("utf-8").strip())
print("="*78)
</code></pre>
<p>输出如下:</p>
^{pr2}$
<p><em>编辑:用eryksun建议的不透明类型替换了<code>c_void_p</code>的用法。谢谢!</em></p>