字符/数字的边界框检测

1条回答

网友

1楼 · 发布于 2024-05-18 19:24:00

解决方案中的问题可能是输入图像质量非常差。人物和背景之间几乎没有任何对比。来自cvlib的blob检测算法可能无法区分字符blob和背景，从而产生无用的二进制掩码。让我们尝试使用纯OpenCV来解决这个问题

我提议采取以下步骤：

应用自适应阈值以获得相当好的二进制掩码
使用区域过滤器清除二进制遮罩上的斑点噪声
使用形态学改善二值图像的质量
获取每个字符的外部轮廓，并将一个边界矩形适配到每个字符块
使用先前计算的边框裁剪每个字符

让我们看看代码：

# importing cv2 & numpy:
import numpy as np
import cv2

# Set image path
path = "C:/opencvImages/"
fileName = "mrrm9.png"

# Read input image:
inputImage = cv2.imread(path+fileName)
inputCopy = inputImage.copy()

# Convert BGR to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

这里没有什么要讨论的，只需阅读BGR图像并将其转换为grayscale。现在，让我们使用gaussian方法应用adaptive threshold。这是一个棘手的部分，因为参数是根据输入的质量手动调整的。该方法的工作原理是将图像划分为windowSize单元格网格，然后应用局部阈值来找到前景和背景之间的最佳分离。可以将windowConstant表示的附加常数添加到阈值以微调输出：

# Set the adaptive thresholding (gasussian) parameters:
windowSize = 31
windowConstant = -1
# Apply the threshold:
binaryImage = cv2.adaptiveThreshold(grayscaleImage, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, windowSize, windowConstant)

您可以看到这样一个漂亮的二进制图像：

现在，如您所见，图像中有一些斑点噪声。让我们应用area filter来消除噪声。噪声小于感兴趣的目标斑点，因此我们可以根据面积轻松过滤它们，如下所示：
# Perform an area filter on the binary blobs: componentsNumber, labeledImage, componentStats, componentCentroids = \ cv2.connectedComponentsWithStats(binaryImage, connectivity=4) # Set the minimum pixels for the area filter: minArea = 20 # Get the indices/labels of the remaining components based on the area stat # (skip the background component at index 0) remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea] # Filter the labeled pixels based on the remaining labels, # assign pixel intensity to 255 (uint8) for the remaining pixels filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
这是过滤后的图像：
我们可以通过一些形态学来提高图像的质量。有些字符似乎已断开（请查看第一个3，它被断开为两个单独的blob）。我们可以应用关闭操作将其加入：
# Set kernel (structuring element) size: kernelSize = 3 # Set operation iterations: opIterations = 1 # Get the structuring element: maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize)) # Perform closing: closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
这是“关闭”图像：
现在，您需要获取每个字符的bounding boxes。让我们检测每个水滴的外部轮廓，并在其周围拟合一个漂亮的矩形：
# Get each bounding box # Find the big contours/blobs on the filtered image: contours, hierarchy = cv2.findContours(closingImage, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) contours_poly = [None] * len(contours) # The Bounding Rectangles will be stored here: boundRect = [] # Alright, just look for the outer bounding boxes: for i, c in enumerate(contours): if hierarchy[0][i][3] == -1: contours_poly[i] = cv2.approxPolyDP(c, 3, True) boundRect.append(cv2.boundingRect(contours_poly[i])) # Draw the bounding boxes on the (copied) input image: for i in range(len(boundRect)): color = (0, 255, 0) cv2.rectangle(inputCopy, (int(boundRect[i][0]), int(boundRect[i][1])), \ (int(boundRect[i][0] + boundRect[i][2]), int(boundRect[i][1] + boundRect[i][3])), color, 2)
最后一个for循环几乎是可选的。它从列表中获取每个边界矩形，并将其绘制在输入图像上，因此您可以看到每个单独的矩形，如下所示：
让我们在二值图像上想象一下：
此外，如果要使用我们刚获得的边界框裁剪每个角色，请按如下方式进行操作：
# Crop the characters: for i in range(len(boundRect)): # Get the roi for each bounding rectangle: x, y, w, h = boundRect[i] # Crop the roi: croppedImg = closingImage[y:y + h, x:x + w] cv2.imshow("Cropped Character: "+str(i), croppedImg) cv2.waitKey(0)
这是获取单个边界框的方式。现在，可能您正试图将这些图像传递给OCR。我尝试将过滤后的二进制图像（在关闭操作之后）传递给pyocr（这是我正在使用的OCR），并将其作为输出字符串：31197402
我用来获取封闭图像的OCR的代码如下：
# Set the OCR libraries: from PIL import Image import pyocr import pyocr.builders # Set pyocr tools: tools = pyocr.get_available_tools() # The tools are returned in the recommended order of usage tool = tools[0] # Set OCR language: langs = tool.get_available_languages() lang = langs[0] # Get string from image: txt = tool.image_to_string( Image.open(path + "closingImage.png"), lang=lang, builder=pyocr.builders.TextBuilder() ) print("Text is:"+txt)
请注意OCR接收白色背景上的黑色字符，因此必须首先反转图像

相关问题更多 >

编程相关推荐

热门问题

热门文章