OCR图像预处理-文字背景分离方法

Posted 2021-11-08 BoostingIsm

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了OCR图像预处理-文字背景分离方法相关的知识，希望对你有一定的参考价值。

1. 摘要

在OCR处理文档时，经常会遇到自然拍照场景中由于光照强度不一、拍摄角度不同、相机成像元件差异，因此会导致拍摄的图片与扫描文档存在较大区别。为使文档文字内容更加鲜明，便于后续特征提取，本文目标是对文档在自然拍照场景下文字与文字背景分离。本文介绍并基于opencv实现了三种文字背景分离方法，最大最小滤波、全局阈值、局部自适应阈值方法。通过实验验证得到局部自适应阈值更适合自然照场景图片文字背景分离的结论。

2. 方法

主要介绍最大最小滤波、全局滤波（阈值、大津法）、局部自适应阈值方法。

2.1 最大最小值滤波

首先要排序周围像素和中心像素值，然后将中心像素值与最小和最大像素值比较，如果比最小值小，则替换中心像素为最小值，如果中心像素比最大值大，则替换中心像素为最大值。代码可参考链接：python实现图像阴影去除_lowl的博客-CSDN博客：

def max_filter(image,filter_size):
    # padding操作，在最大滤波中需要在原图像周围填充（filter_size//2）个小的数字，一般取-1
    # 先生成一个全为-1的矩阵，大小和padding后的图像相同
    empty_image = np.full((image.shape[0] + (filter_size // 2) * 2, image.shape[1] + (filter_size // 2) * 2), -1)
    # 将原图像填充进矩阵
    empty_image[(filter_size // 2):empty_image.shape[0] - (filter_size // 2),
    (filter_size // 2):empty_image.shape[1] - (filter_size // 2)] = image.copy()
    # 创建结果矩阵，和原图像大小相同
    result = np.full((image.shape[0], image.shape[1]), -1)
    # 遍历原图像中的每个像素点，对于点，选取其周围（filter_size*filter_size）个像素中的最大值，作为结果矩阵中的对应位置值
    for h in range(filter_size // 2, empty_image.shape[0]-filter_size // 2):
        for w in range(filter_size // 2, empty_image.shape[1]-filter_size // 2):
            filter = empty_image[h - (filter_size // 2):h + (filter_size // 2) + 1,
                     w - (filter_size // 2):w + (filter_size // 2) + 1]
            result[h-filter_size // 2, w-filter_size // 2] = np.amax(filter)
    return result
def min_filter(image,filter_size):
    # padding操作，在最大滤波中需要在原图像周围填充（filter_size//2）个大的数字，一般取大于255的
    # 先生成一个全为-1的矩阵，大小和padding后的图像相同
    empty_image = np.full((image.shape[0] + (filter_size // 2) * 2, image.shape[1] + (filter_size // 2) * 2), 400)
    # 将原图像填充进矩阵
    empty_image[(filter_size // 2):empty_image.shape[0] - (filter_size // 2),
    (filter_size // 2):empty_image.shape[1] - (filter_size // 2)] = image.copy()
    # 创建结果矩阵，和原图像大小相同
    result = np.full((image.shape[0], image.shape[1]), 400)
    # 遍历原图像中的每个像素点，对于点，选取其周围（filter_size*filter_size）个像素中的最小值，作为结果矩阵中的对应位置值
    for h in range(filter_size // 2, empty_image.shape[0]-filter_size // 2):
        for w in range(filter_size // 2, empty_image.shape[1]-filter_size // 2):
            filter = empty_image[h - (filter_size // 2):h + (filter_size // 2) + 1,
                     w - (filter_size // 2):w + (filter_size // 2) + 1]
            result[h-filter_size // 2, w-filter_size // 2] = np.amin(filter)
    return result
def remove_shadow(image_path):
    image = cv2.imread(image_path, 0)
    max_result=max_filter(image,30)
    min_result=min_filter(max_result,30)
    result=image-min_result
    result=cv2.normalize(result, None, 0, 255, norm_type=cv2.NORM_MINMAX)
    return result

if __name__ == '__main__':
    #方法：最大最小值滤波
    gray = remove_shadow('xxx.jpg')
    cv2.imwrite("./xxx_out1.jpg",gray)

2.2 全局阈值法

全局阈值主要有阈值法和大津法。阈值法由于图像灰度值受到很多因素影响，不同图像阈值不同，分离效果极大程度受到人工干预，不做过多介绍。大津法针对每幅图像直方图自动计算全局阈值。在opencv有代码可以直接调用，代码参考如下：

    #方法：大津法
    gray = cv2.imread('xxx.jpg',0)
    _, gray = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
    cv2.imwrite("./xxx_out2.jpg",gray)

2.3 局部自适应阈值法

局部自适应阈值考虑了图像在不同区域具有不同照明条件时，应进行自适应阈值处理。该算法为同一图像的不同区域获得不同的阈值，并且它为具有不同照明的图像提供了更好的结果。在opencv中代码可以直接调用，代码参考如下：

    #方法：局部最适应滤波
    gray = cv2.imread('xxx.jpg', 0)
    th = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 10)
    th = cv2.boxFilter(th,-1,(1,1),normalize=False)
    cv2.imwrite("./xxx_out3.jpg", th)

在局部自适应阈值方法中，上述参数较适合大多数1080P的图片处理。

3.实验

三种方法对比效果如下：

最大最小值滤波法不能完全去除背景，需要后续的阈值二值化操作，但单靠最大最小值滤波不能完全实现文字背景分离效果。大津法在图像有阴影的情况下提取效果差，局部自适应阈值法在文字背景分离任务中，适用场景更加丰富。

4. 总结

最大最小值滤波法适合只去除背景不二值化提取文本的情况。

大津法适合光照渐变差异小、无阴影的文档，这些文档图片在图像直方图中表现为有双峰特征（背景、文字分别为峰值）。

局部自适应阈值法在图片梯度变化剧烈，如手机拍摄电脑时产生的莫尔条纹，处理得到的效果较差，适用于拍照场景图片转扫描印刷图片等此类功能需求。

以上是关于OCR图像预处理-文字背景分离方法的主要内容，如果未能解决你的问题，请参考以下文章