PyTesseract不识别小数问题的回答

PyTesseract不识别小数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

这并不是真正的<a href="https://stackoverflow.com/questions/57480469/how-to-extract-decimal-in-image-with-pytesseract">How to extract decimal in image with Pytesseract</a>的复制品，因为这些答案并没有解决我的问题，我的用例也不同 我正在使用PyTesseract识别表格单元格中的文本。当涉及到用小数点识别药物剂量时，OCR无法识别<code>.</code>，尽管它对其他方面都是准确的。我正在Windows10上使用<code>tesseract v5.0.0-alpha.20200328</code> 我的预处理包括使用立方体放大400%，转换为黑白，膨胀和侵蚀，形态和模糊。我尝试了所有这些（以及每一个）的合理组合，但是没有任何东西能够识别出<code>.</code> 我尝试了<code>--psm</code>各种值以及字符白名单。我相信字体是<code>Sergoe UI</code> 处理前： <a href="https://i.stack.imgur.com/S87rd.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/S87rd.png" alt="pre pre-processed"/></a> 处理后：<a href="https://i.stack.imgur.com/OFjoL.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/OFjoL.png" alt="enter image description here"/></a> PyteSeract输出：<code>25mg »p</code> 处理代码： <pre><code>import cv2, pytesseract import numpy as np image = cv2.imread( '01.png' ) upscaled_image = cv2.resize(image, None, fx = 4, fy = 4, interpolation = cv2.INTER_CUBIC) bw_image = cv2.cvtColor(upscaled_image, cv2.COLOR_BGR2GRAY) kernel = np.ones((2, 2), np.uint8) dilated_image = cv2.dilate(bw_image, kernel, iterations=1) eroded_image = cv2.erode(dilated_image, kernel, iterations=1) thresh = cv2.threshold(eroded_image, 205, 255, cv2.THRESH_BINARY)[1] kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) morh_image = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel) blur_image = cv2.threshold(cv2.bilateralFilter(morh_image, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1] final_image = blur_image text = pytesseract.image_to_string(final_image, lang='eng', config='--psm 10') </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

PyTesseract不识别小数

1 个回答

相关Python问题