基于Python，OpenCV中的优先级对轮廓进行排序

Posted 2023-04-17

技术标签:

【中文标题】基于Python，OpenCV中的优先级对轮廓进行排序【英文标题】：Sorting contours based on precedence in Python, OpenCV [duplicate] 【发布时间】：2020-12-15 04:54:27 【问题描述】：

我正在尝试根据它们的到达对轮廓进行排序，left-to-right 和 top-to-bottom 就像你写任何东西一样。来自top 和left，然后以相应的方式出现。

这就是我到目前为止所取得的成就和方式：

def get_contour_precedence(contour, cols):
    tolerance_factor = 61
    origin = cv2.boundingRect(contour)
    return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]


image = cv2.imread("C:/Users/XXXX/PycharmProjects/OCR/raw_dataset/23.png", 0)

ret, thresh1 = cv2.threshold(image, 130, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

contours, h = cv2.findContours(thresh1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# perform edge detection, find contours in the edge map, and sort the
# resulting contours from left-to-right
contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))

# initialize the list of contour bounding boxes and associated
# characters that we'll be OCR'ing
chars = []
inc = 0
# loop over the contours
for c in contours:
    inc += 1

    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)

    label = str(inc)
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.putText(image, label, (x - 2, y - 2),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    print('x=', x)
    print('y=', y)
    print('x+w=', x + w)
    print('y+h=', y + h)
    crop_img = image[y + 2:y + h - 1, x + 2:x + w - 1]
    name = os.path.join("bounding boxes", 'Image_%d.png' % (
        inc))
    cv2.imshow("cropped", crop_img)
    print(name)
    crop_img = Image.fromarray(crop_img)
    crop_img.save(name)
    cv2.waitKey(0)

cv2.imshow('mat', image)
cv2.waitKey(0)

输入图像：

输出图像 1：

输入图像 2：

图像 2 的输出：

输入图像 3：

输出图像 3：

如您所见，1、2、3、4 并不是我所期望的每张图像，如图像编号 3 所示。

如何调整它以使其工作甚至编写自定义函数？

注意：我的问题中提供了相同输入图像的多个图像。内容相同，但它们在文本中有所不同，因此tolerance factor 不适用于它们中的每一个。手动调整它不是一个好主意。

【问题讨论】：

先拆分文本行。这应该相当容易，因为您在文本行之间有所有黑色行。然后对于每一行，您可以轻松地从左到右排序 @Miki 如何对文本行进行排序，因为轮廓每次都没有正确排序？检查***.com/a/48268334/5008845 @Miki 如果. 介于两者之间或- 是否仍然有效？您对回复的期望很高。如果您 1) 编辑您的问题，使代码更具可读性 2) 显示原始输入图像 3) 显示您期望输出的样子的示例，您可能会更感兴趣 3) 您的所有图像都会有水平文本吗？或者有些可能是一个角度？ 【参考方案1】：

这是我对这个问题的看法。我会给你它的一般要点，然后是我在C++ 中的实现。主要思想是我想从从左到右，从上到下处理图像。我将处理我发现的每个 blob（或轮廓），但是，我需要几个中间步骤来实现成功的（有序的）分割。

使用行

的垂直排序

第一步尝试按行对 blob 进行排序——这意味着每一行都有一组（无序的）水平 blob .没关系。第一步是计算某种垂直排序，如果我们从上到下处理每一行，我们就能做到。

在 blob 按行（垂直）排序后，我可以检查它们的质心（或质心）并水平排序它们。我的想法是我将处理 row per row 并且，for 每行，我对 blob centroids 进行排序。让我们看一个我在这里尝试实现的示例。

这是您的输入图像：

这就是我所说的行掩码：

最后一张图片包含白色区域，每个区域代表一个“行”。每个 row 都有一个数字（例如，Row1、Row2 等），每个 row 包含一组 blob（或字符，在这种情况下）。通过处理每个row，从上到下，您已经在垂直轴上对 blob 进行排序。

如果我从上到下对每一行进行编号，我会得到这个图像：

行掩码是一种创建“blob 行”的方法，并且可以形态学计算此掩码。查看重叠的 2 张图像，以便您更好地了解处理顺序：

我们在这里尝试做的是，首先，垂直排序（蓝色箭头），然后我们将处理水平（红色箭头）排序。您可以看到，通过处理每一行，我们可以（可能）克服排序问题！

使用质心

的水平排序

现在让我们看看如何对 blob horizontally 进行排序。如果我们创建一个更简单的图像，width 等于输入图像，height 等于 Row Mask 中rows 的数量，我们可以简单地覆盖每个水平坐标（ x 坐标）每个斑点质心。看看这个例子：

这是一个行表。每行代表行掩码中找到的行数，也是从上到下读取的。表格的width 与输入图像的width 相同，并且在空间上对应于水平轴。每个 square 是输入图像中的一个像素，仅使用水平坐标映射到行表（因为我们对行的简化非常简单）。行表中每个像素的实际值是 label，标记输入图像上的每个 blob。注意标签没有顺序！

因此，例如，此表显示，在 第 1 行 中（您已经知道第 1 行是什么 - 它是 Row Mask 上的第一个白色区域）在(1,4) 的位置有blob 编号3。在位置(1,6) 有blob 编号2，依此类推。这张表的酷（我认为）是你可以循环遍历它，for 每个值都与0 不同，水平排序变得非常简单。这是现在从左到右排序的行表：

用质心映射 blob 信息

我们将使用 blobs centroids 到map 我们两个表示（行掩码/行表）之间的信息。假设您已经拥有两个“辅助”图像，并且一次处理输入图像上的每个斑点（或轮廓）。例如，你有这个作为开始：

好的，这里有一个 blob。我们如何将它映射到 Row Mask 和 Row Table？使用它的质心。如果我们计算质心（在图中显示为绿点），我们可以构建质心和标签的dictionary。例如，对于此 blob，centroid 位于 (271,193)。好的，让我们分配label = 1。所以我们现在有了这个字典：

现在，我们使用行掩码上的相同 centroid 找到此 blob 放置的 row。像这样的：

rowNumber = rowMask.at( 271,193 )

此操作应返回rownNumber = 3。好的！我们知道我们的 blob 放置在哪一行，因此，它现在是垂直排序的。现在，让我们将其水平坐标存储在行表中：

rowTable.at( 271, 193 ) = 1

现在，rowTable 保存（在其行和列中）已处理 blob 的标签。行表应如下所示：

表格宽很多，因为它的水平尺寸必须与您的输入图像相同。在此图像中，label 1 放置在 Column 271, Row 3. 中。如果这是图像上唯一的 blob，则 blob 已经排序。但是，如果您在Column 2、Row 1 中添加另一个 blob，会发生什么？这就是为什么您需要在处理完所有 blob 后再次遍历此表 - 以正确更正它们的标签。

在 C++ 中的实现

好的，希望算法应该有点清楚（如果不是，请问，我的男人）。我将尝试使用C++ 在OpenCV 中实现这些想法。首先，我需要您输入的binary image。使用Otsu’s thresholding 方法计算很简单：

//Read the input image:
std::string imageName = "C://opencvImages//yFX3M.png";
cv::Mat testImage = cv::imread( imageName );

//Compute grayscale image
cv::Mat grayImage;
cv::cvtColor( testImage, grayImage, cv::COLOR_RGB2GRAY );

//Get binary image via Otsu:
cv::Mat binImage;
cv::threshold( grayImage, binImage, 0, 255, cv::THRESH_OTSU );

//Invert image:
binImage = 255 - binImage;

这是生成的二进制图像，没什么花哨的，正是我们开始工作所需要的：

第一步是获取Row Mask。这可以使用形态学来实现。只需应用一个dilation + erosion 和一个非常大水平的structuring element。这个想法是你想把这些斑点变成矩形，将它们水平“融合”在一起：

//Create a hard copy of the binary mask:
cv::Mat rowMask = binImage.clone();

//horizontal dilation + erosion:
int horizontalSize = 100; // a very big horizontal structuring element
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(horizontalSize,1) );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_DILATE, SE, cv::Point(-1,-1), 2 );
cv::morphologyEx( rowMask, rowMask, cv::MORPH_ERODE, SE, cv::Point(-1,-1), 1 );

这会产生以下Row Mask：

太酷了，现在我们有了Row Mask，我们必须给它们编号，好吗？有很多方法可以做到这一点，但现在我对更简单的方法感兴趣：遍历这张图片并获取每一个像素。 If 一个像素是白色的，使用 Flood Fill 操作将图像的该部分标记为唯一的 blob（或行，在这种情况下）。这可以按如下方式完成：

//Label the row mask:
int rowCount = 0; //This will count our rows

//Loop thru the mask:
for( int y = 0; y < rowMask.rows; y++ )
    for( int x = 0; x < rowMask.cols; x++ )
        //Get the current pixel:
        uchar currentPixel = rowMask.at<uchar>( y, x );
        //If the pixel is white, this is an unlabeled blob:
        if ( currentPixel == 255 ) 
            //Create new label (different from zero):
            rowCount++;
            //Flood fill on this point:
            cv::floodFill( rowMask, cv::Point( x, y ), rowCount, (cv::Rect*)0, cv::Scalar(), 0 );

此过程将标记从1 到r 的所有行。这就是我们想要的。如果您查看图像，您会隐约看到行，这是因为我们的标签对应的灰度像素强度值非常低。

好的，现在让我们准备行表。这个“表格”实际上只是另一个图像，请记住：与输入相同的宽度和与您在Row Mask 上计算的行数相同的高度：

//create rows image:
cv::Mat rowTable = cv::Mat::zeros( cv::Size(binImage.cols, rowCount), CV_8UC1 );
//Just for convenience:
rowTable = 255 - rowTable;

在这里，为了方便起见，我只是反转了最终图像。因为我想实际查看表格是如何填充（非常低强度）像素的，并确保一切都按预期工作。

现在是有趣的部分。我们准备了两个图像（或数据容器）。我们需要独立处理每个 blob。这个想法是你必须从二进制图像中提取每个 blob/contour/character 并计算其centroid 并分配一个新的label。同样，有很多方法可以做到这一点。在这里，我使用以下方法：

我将遍历binary mask。我将从这个二进制输入中得到current biggest blob。我将计算其centroid 并将其数据存储在所需的每个容器中，然后，我将delete 来自掩码的那个blob。我将重复这个过程，直到不再留下任何斑点。这是我这样做的方式，特别是因为我已经为此编写了函数。这是方法：

//Prepare a couple of dictionaries for data storing:
std::map< int, cv::Point > blobMap; //holds label, gives centroid
std::map< int, cv::Rect > boundingBoxMap; //holds label, gives bounding box

首先，两个dictionaries。一个接收一个blob标签并返回质心。另一个接收相同的标签并返回边界框。

//Extract each individual blob:
cv::Mat bobFilterInput = binImage.clone();

//The new blob label:
int blobLabel = 0;

//Some control variables:
bool extractBlobs = true; //Controls loop
int currentBlob = 0; //Counter of blobs

while ( extractBlobs )

    //Get the biggest blob:
    cv::Mat biggestBlob = findBiggestBlob( bobFilterInput );

    //Compute the centroid/center of mass:
    cv::Moments momentStructure = cv::moments( biggestBlob, true );
    float cx = momentStructure.m10 / momentStructure.m00;
    float cy = momentStructure.m01 / momentStructure.m00;

    //Centroid point:
    cv::Point blobCentroid;
    blobCentroid.x = cx;
    blobCentroid.y = cy;

    //Compute bounding box:
    boundingBox boxData;
    computeBoundingBox( biggestBlob, boxData );

    //Convert boundingBox data into opencv rect data:
    cv::Rect cropBox = boundingBox2Rect( boxData );


    //Label blob:
    blobLabel++;
    blobMap.emplace( blobLabel, blobCentroid );
    boundingBoxMap.emplace( blobLabel, cropBox );

    //Get the row for this centroid
    int blobRow = rowMask.at<uchar>( cy, cx );
    blobRow--;

    //Place centroid on rowed image:
    rowTable.at<uchar>( blobRow, cx ) = blobLabel;

    //Resume blob flow control:
    cv::Mat blobDifference = bobFilterInput - biggestBlob;
    //How many pixels are left on the new mask?
    int pixelsLeft = cv::countNonZero( blobDifference );
    bobFilterInput = blobDifference;

    //Done extracting blobs?
    if ( pixelsLeft <= 0 )
        extractBlobs = false;
    

    //Increment blob counter:
    currentBlob++;

查看这个处理如何通过每个 blob、处理它并删除它直到什么都不剩的漂亮动画：

现在，上面有一些关于 sn-p 的注释。我有一些辅助函数：biggestBlob 和 computeBoundingBox。这些函数计算二进制图像中的最大 blob，并将边界框的自定义结构分别转换为OpenCV 的Rect 结构。这些是这些函数执行的操作。

sn-p 的“内容”是这样的：一旦你有一个孤立的 blob，计算它的centroid（我实际上是通过central moments 计算center of mass）。生成一个新的label。将此label 和centroid 存储在dictionary 中，在我的例子中是blobMap 字典。另外计算bounding box并将其存储在另一个dictionary、boundingBoxMap中：

//Label blob:
blobLabel++;
blobMap.emplace( blobLabel, blobCentroid );
boundingBoxMap.emplace( blobLabel, cropBox );

现在，使用 centroid 数据，fetch 对应于该 blob 的 row。获得行后，将此数字存储到行表中：

//Get the row for this centroid
int blobRow = rowMask.at<uchar>( cy, cx );
blobRow--;

//Place centroid on rowed image:
rowTable.at<uchar>( blobRow, cx ) = blobLabel;

非常好。此时您已准备好行表。让我们遍历它，实际上，最后，订购那些该死的 blob：

int blobCounter = 1; //The ORDERED label, starting at 1
for( int y = 0; y < rowTable.rows; y++ )
    for( int x = 0; x < rowTable.cols; x++ )
        //Get current label:
        uchar currentLabel = rowTable.at<uchar>( y, x );
        //Is it a valid label?
        if ( currentLabel != 255 )
            //Get the bounding box for this label:
            cv::Rect currentBoundingBox = boundingBoxMap[ currentLabel ];
            cv::rectangle( testImage, currentBoundingBox, cv::Scalar(0,255,0), 2, 8, 0 );
            //The blob counter to string:
            std::string counterString = std::to_string( blobCounter );
            cv::putText( testImage, counterString, cv::Point( currentBoundingBox.x, currentBoundingBox.y-1 ),
                         cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255,0,0), 1, cv::LINE_8, false );
            blobCounter++; //Increment the blob/label

没什么特别的，只是一个常规的嵌套for 循环，循环遍历row table 上的每个像素。如果像素与白色不同，请使用label 检索centroid 和bounding box，并将label 更改为递增的数字。为了显示结果，我只在原始图像上绘制边界框和新标签。

查看此动画中的有序处理：

非常酷，这是一个额外的动画，行表被水平坐标填充：

【讨论】：

这是一个很好的选择，但是当字符太接近时，让我们假设在22-2020 中找不到正确的 blob。我可以理解为人为错误，不需要用作 OCR 的评估。感谢您提供详细解释的答案。为我的 Python 实现处理你的代码会让你知道。 ...另外，rowMask.rows 是什么以及如何实现它才能在 Python 中执行它？ rowMask.rows 只是矩阵rowMask 的rows，与cols 相同。是的，如果您在将其移植到 Python 时遇到任何问题或有任何其他问题，请告诉我！ @Jim Vaghela 另外，我看不出22-2020 行的问题...您能详细说明一下吗？也许我可以找到替代解决方案！ @JimitVaghela 啊！对于第一部分，即具有“重叠”斑点的部分，二进制掩码可能会将斑点“融合”在一起，从而为您提供一个大斑点和一个质心，而不是两个斑点和两个质心。轻度侵蚀应该有助于解决问题。对于另一部分（for 循环），请随时加入 this chat，以便我们进一步讨论转换。【参考方案2】：

我宁愿使用质心或至少使用边界框中心，而不是使用轮廓的左上角。

def get_contour_precedence(contour, cols):
tolerance_factor = 4
origin = cv2.boundingRect(contour)
return (((origin[1] + origin[3])/2 // tolerance_factor) * tolerance_factor) * cols + (origin[0] + origin[2]) / 2

但可能很难找到适用于所有情况的容差值。

【讨论】：

有什么办法可以克服吗？在使用相同的容差值时，即使文本中的微小变化也会导致错误。调整阈值图像的大小会更好吗？那么看来是集群问题，你应该看看这个帖子：***.com/questions/32428520/…【参考方案3】：

我什至会说使用色调矩可以更好地估计多边形的中心点比矩形的“正常”坐标中心点，所以函数可以是：

def get_contour_precedence(contour, cols):
     tolerance_factor = 61
     M = cv2.moments(contour)
     # calculate x,y coordinate of centroid
     if M["m00"] != 0:
             cX = int(M["m10"] / M["m00"])
             cY = int(M["m01"] / M["m00"])
     else:
     # set values as what you need in the situation
             cX, cY = 0, 0
     return ((cY // tolerance_factor) * tolerance_factor) * cols + cX

超级数学。解释什么是色调时刻，你能找到here

也许你应该考虑摆脱这个容差因素通过使用一般的聚类算法，如 kmeans 将您的中心聚集到行和列。 OpenCv 有一个 kmeans 实现，你可以找到 here

我不完全知道您的目标是什么，但另一个想法可能是将每一行分成一个感兴趣区域 (ROI) 为了进一步处理，之后您可以轻松地计算字母通过每个轮廓的 X 值和行号

import cv2
import numpy as np

## (1) read
img = cv2.imread("yFX3M.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

## (2) threshold
th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)

## (3) minAreaRect on the nozeros
pts = cv2.findNonZero(threshed)
ret = cv2.minAreaRect(pts)

(cx,cy), (w,h), ang = ret
if w>h:
    w,h = h,w

## (4) Find rotated matrix, do rotation
M = cv2.getRotationMatrix2D((cx,cy), ang, 1.0)
rotated = cv2.warpAffine(threshed, M, (img.shape[1], img.shape[0]))

## (5) find and draw the upper and lower boundary of each lines
hist = cv2.reduce(rotated,1, cv2.REDUCE_AVG).reshape(-1)

th = 2
H,W = img.shape[:2]
#   (6) using histogramm with threshold
uppers = [y for y in range(H-1) if hist[y]<=th and hist[y+1]>th]
lowers = [y for y in range(H-1) if hist[y]>th and hist[y+1]<=th]

rotated = cv2.cvtColor(rotated, cv2.COLOR_GRAY2BGR)
for y in uppers:
    cv2.line(rotated, (0,y), (W, y), (255,0,0), 1)

for y in lowers:
    cv2.line(rotated, (0,y), (W, y), (0,255,0), 1)
cv2.imshow('pic', rotated)

# (7) we iterate all rois and count 
for i in range(len(uppers)) : 
    print('line=',i)
    roi = rotated[uppers[i]:lowers[i],0:W]
    cv2.imshow('line', roi)
    cv2.waitKey(0)
    # here again calc thres and contours

我发现了一个旧帖子，上面有这个代码here

【讨论】：

最终收到此错误：

Traceback (most recent call last):   File "C:/XXX/eva_module/dataset_gen.py", line 29, in &lt;module&gt;     contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))   File "C:/Users/XXX/PycharmProjects/OCR/eva_module/dataset_gen.py", line 29, in &lt;lambda&gt;     contours.sort(key=lambda x: get_contour_precedence(x, thresh1.shape[1]))   File "C:/Users/XXX/PycharmProjects/OCR/eva_module/dataset_gen.py", line 16, in get_contour_precedence     cX = int(M["m10"] / M["m00"]) ZeroDivisionError: float division by zero

忘记了这种可能性并希望解决了这个问题，不确定时刻[m00] 何时实际上为零，这意味着形状的面积为零并且无论如何都应该排除？【参考方案4】：

这是 Python/OpenCV 中的一种方法，先按行处理，然后按字符处理。

读取输入转换为灰度阈值和反转使用长的水平内核并应用形态接近形成行获取行的轮廓及其边界框保存行框并按 Y 排序遍历每个已排序的行框并从阈值图像中提取行获取行中每个字符的轮廓并保存字符的边界框。对 X 上给定行的轮廓进行排序在输入上绘制边界框并将索引号绘制为图像上的文本增加索引保存结果

输入：

import cv2
import numpy as np

# read input image
img = cv2.imread('vision78.png')

# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# otsu threshold
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
thresh = 255 - thresh 

# apply morphology close to form rows
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (51,1))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

# find contours and bounding boxes of rows
rows_img = img.copy()
boxes_img = img.copy()
rowboxes = []
rowcontours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rowcontours = rowcontours[0] if len(rowcontours) == 2 else rowcontours[1]
index = 1
for rowcntr in rowcontours:
    xr,yr,wr,hr = cv2.boundingRect(rowcntr)
    cv2.rectangle(rows_img, (xr, yr), (xr+wr, yr+hr), (0, 0, 255), 1)
    rowboxes.append((xr,yr,wr,hr))

# sort rowboxes on y coordinate
def takeSecond(elem):
    return elem[1]
rowboxes.sort(key=takeSecond)
    
# loop over each row    
for rowbox in rowboxes:
    # crop the image for a given row
    xr = rowbox[0]
    yr = rowbox[1]
    wr = rowbox[2]
    hr = rowbox[3]  
    row = thresh[yr:yr+hr, xr:xr+wr]
    bboxes = []
    # find contours of each character in the row
    contours = cv2.findContours(row, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]
    for cntr in contours:
        x,y,w,h = cv2.boundingRect(cntr)
        bboxes.append((x+xr,y+yr,w,h))
    # sort bboxes on x coordinate
    def takeFirst(elem):
        return elem[0]
    bboxes.sort(key=takeFirst)
    # draw sorted boxes
    for box in bboxes:
        xb = box[0]
        yb = box[1]
        wb = box[2]
        hb = box[3]
        cv2.rectangle(boxes_img, (xb, yb), (xb+wb, yb+hb), (0, 0, 255), 1)
        cv2.putText(boxes_img, str(index), (xb,yb), cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.75, (0,255,0), 1)
        index = index + 1
    
# save result
cv2.imwrite("vision78_thresh.jpg", thresh)
cv2.imwrite("vision78_morph.jpg", morph)
cv2.imwrite("vision78_rows.jpg", rows_img)
cv2.imwrite("vision78_boxes.jpg", boxes_img)

# show images
cv2.imshow("thresh", thresh)
cv2.imshow("morph", morph)
cv2.imshow("rows_img", rows_img)
cv2.imshow("boxes_img", boxes_img)
cv2.waitKey(0)

阈值图像：

行的形态图像：

行轮廓图：

人物轮廓图：

【讨论】：

以上是关于基于Python，OpenCV中的优先级对轮廓进行排序的主要内容，如果未能解决你的问题，请参考以下文章