使用 OpenCV 的图像处理从图像中删除背景文本和噪声

Posted

技术标签:

【中文标题】使用 OpenCV 的图像处理从图像中删除背景文本和噪声【英文标题】:Remove background text and noise from an image using image processing with OpenCV 【发布时间】:2020-05-25 11:17:48 【问题描述】:

我有这些图片

我想删除背景中的文本。只有captcha characters 应该保留(即K6PwKA、YabVzu)。任务是稍后使用 tesseract 识别这些字符。

这是我尝试过的方法,但准确性并不高。

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"
img = cv2.imread("untitled.png")
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_filtered = cv2.inRange(gray_image, 0, 75)
cv2.imwrite("cleaned.png", gray_filtered)

我该如何改进?

注意: 我尝试了所有关于这个问题的建议,但没有一个对我有用。

编辑: 根据 Elias 的说法,我尝试使用 Photoshop 找到验证码文本的颜色,方法是将其转换为介于 [100, 105] 之间的灰度。然后我根据这个范围对图像进行阈值处理。但是我得到的结果并没有从 tesseract 中得到令人满意的结果。

gray_filtered = cv2.inRange(gray_image, 100, 105)
cv2.imwrite("cleaned.png", gray_filtered)
gray_inv = ~gray_filtered
cv2.imwrite("cleaned.png", gray_inv)
data = pytesseract.image_to_string(gray_inv, lang='eng')

输出:

'KEP wKA'

结果:

编辑 2:

def get_text(img_name):
    lower = (100, 100, 100)
    upper = (104, 104, 104) 
    img = cv2.imread(img_name)
    img_rgb_inrange = cv2.inRange(img, lower, upper)
    neg_rgb_image = ~img_rgb_inrange
    cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)
    data = pytesseract.image_to_string(neg_rgb_image, lang='eng')
    return data

给:

文本为

GXuMuUZ

有什么办法可以缓和一点

【问题讨论】:

我认为这种行为是不道德的。试图破坏 CAPTCHA 保护表明对服务器所有者缺乏尊重,无论他们这样做是为了保护他们的带宽还是他们的业务 它是一个大学成绩网站,别担心 【参考方案1】:

这里有两种可能的方法和一种纠正扭曲文本的方法:

方法一:形态学运算+轮廓滤波

    获取二进制图像。Load image,grayscale,然后是Otsu's threshold。

    去除文字轮廓。cv2.getStructuringElement()创建一个矩形内核,然后执行morphological operations去除噪音。

    过滤并去除小噪音。 Find contours 并使用contour area 过滤以去除小颗粒。我们通过用cv2.drawContours()填充轮廓来有效地去除噪声

    执行 OCR。 我们反转图像,然后应用轻微的 Gaussian blur。然后,我们使用 Pytesseract 和 --psm 6 配置选项进行 OCR,将图像视为单个文本块。查看Tesseract improve quality 了解其他改进检测的方法,查看Pytesseract configuration options 了解其他设置。


输入图像->二进制->变形打开

轮廓区域过滤->反转->应用模糊得到结果

OCR 的结果

YabVzu

代码

import cv2
import pytesseract
import numpy as np

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, Otsu's threshold
image = cv2.imread('2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)

# Find contours and remove small noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 50:
        cv2.drawContours(opening, [c], -1, 0, -1)

# Invert and apply slight Gaussian blur
result = 255 - opening
result = cv2.GaussianBlur(result, (3,3), 0)

# Perform OCR
data = pytesseract.image_to_string(result, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()     

方法二:颜色分割

观察到要提取的所需文本与图像中的噪声具有可区分的对比度,我们可以使用颜色阈值来隔离文本。这个想法是转换为 HSV 格式然后颜色阈值以获得使用较低/较高颜色范围的掩码。从我们是否使用相同的过程到 Pytesseract 进行 OCR。


输入图像-&gt;掩码-&gt;结果

代码

import cv2
import pytesseract
import numpy as np

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('2.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)

# Invert image and OCR
invert = 255 - mask
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.imshow('invert', invert)
cv2.waitKey()

纠正扭曲的文字

OCR 在图像水平时效果最佳。为了确保文本是 OCR 的理想格式,我们可以执行透视变换。在去除所有噪声以隔离文本后,我们可以执行变形关闭以将单个文本轮廓组合成单个轮廓。从这里我们可以使用cv2.minAreaRect 找到旋转的边界框,然后使用imutils.perspective.four_point_transform 执行four point perspective transform。继续清洁面膜,结果如下:

Mask -&gt; Morph close -&gt; 检测到旋转的边界框-&gt; 结果

与其他图像一起输出

更新代码以包含透视变换

import cv2
import pytesseract
import numpy as np
from imutils.perspective import four_point_transform

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)

# Morph close to connect individual text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=3)

# Find rotated bounding box then perspective transform
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rect = cv2.minAreaRect(cnts[0])
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(image,[box],0,(36,255,12),2)
warped = four_point_transform(255 - mask, box.reshape(4, 2))

# OCR
data = pytesseract.image_to_string(warped, lang='eng', config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('warped', warped)
cv2.imshow('image', image)
cv2.waitKey()

注意:颜色阈值范围是使用此 HSV 阈值脚本确定的

import cv2
import numpy as np

def nothing(x):
    pass

# Load image
image = cv2.imread('2.png')

# Create a window
cv2.namedWindow('image')

# Create trackbars for color change
# Hue is from 0-179 for Opencv
cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
cv2.createTrackbar('VMax', 'image', 0, 255, nothing)

# Set default value for Max HSV trackbars
cv2.setTrackbarPos('HMax', 'image', 179)
cv2.setTrackbarPos('SMax', 'image', 255)
cv2.setTrackbarPos('VMax', 'image', 255)

# Initialize HSV min/max values
hMin = sMin = vMin = hMax = sMax = vMax = 0
phMin = psMin = pvMin = phMax = psMax = pvMax = 0

while(1):
    # Get current positions of all trackbars
    hMin = cv2.getTrackbarPos('HMin', 'image')
    sMin = cv2.getTrackbarPos('SMin', 'image')
    vMin = cv2.getTrackbarPos('VMin', 'image')
    hMax = cv2.getTrackbarPos('HMax', 'image')
    sMax = cv2.getTrackbarPos('SMax', 'image')
    vMax = cv2.getTrackbarPos('VMax', 'image')

    # Set minimum and maximum HSV values to display
    lower = np.array([hMin, sMin, vMin])
    upper = np.array([hMax, sMax, vMax])

    # Convert to HSV format and color threshold
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, lower, upper)
    result = cv2.bitwise_and(image, image, mask=mask)

    # Print if there is a change in HSV value
    if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ):
        print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
        phMin = hMin
        psMin = sMin
        pvMin = vMin
        phMax = hMax
        psMax = sMax
        pvMax = vMax

    # Display result image
    cv2.imshow('image', result)
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

【讨论】:

请帮我解决这个问题question。【参考方案2】:

您的代码会产生比这更好的结果。在这里,我根据直方图CDF 值和阈值设置upperblowerb 值的阈值。按ESC 按钮获取下一张图片。

此代码过于复杂,需要以各种方式进行优化。可以重新排序代码以跳过某些步骤。我保留了它,因为某些部分可能会帮助其他部分。一些现有的噪声可以通过保持区域高于某个阈值的轮廓来去除。欢迎对其他降噪方法提出任何建议。

可以在此处找到用于获取 4 个角点以进行透视变换的类似更简单的代码,

Accurate corners detection?

代码说明:

原图 中值滤波器(去噪和 ROI 识别) OTSU 阈值 反转图像 使用反转黑白图像作为遮罩,以保留原始图像的大部分 ROI 部分 最大轮廓发现的扩张

通过在原图中绘制矩形和角点来标记ROI

拉直 ROI 并提取它

中值滤波器 OTSU 阈值 为蒙版反转图像 屏蔽直图像以进一步去除文本中的大部分噪点 In Range 与上述直方图 cdf 中的 lowerb 和 upperb 值一起使用,以进一步降低噪声 也许在这一步腐蚀图像会产生一些可接受的结果。取而代之的是,该图像再次被放大并用作掩码,以从透视变换图像中获得较少噪声的 ROI。

代码:

## Press ESC button to get next image

import cv2
import cv2 as cv
import numpy as np


frame = cv2.imread('extra/c1.png')
#frame = cv2.imread('extra/c2.png')


## keeping a copy of original
print(frame.shape)
original_frame = frame.copy()
original_frame2 = frame.copy()


## Show the original image
winName = 'Original'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)



## Apply median blur
frame = cv2.medianBlur(frame,9)


## Show the original image
winName = 'Median Blur'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


#kernel = np.ones((5,5),np.uint8)
#frame = cv2.dilate(frame,kernel,iterations = 1)



# Otsu's thresholding
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
ret2,thresh_n = cv.threshold(frame,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
frame = thresh_n


## Show the original image
winName = 'Otsu Thresholding'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)




## invert color
frame = cv2.bitwise_not(frame)

## Show the original image
winName = 'Invert Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


## Dilate image
kernel = np.ones((5,5),np.uint8)
frame = cv2.dilate(frame,kernel,iterations = 1)


##
## Show the original image
winName = 'SUB'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
img_gray = cv2.cvtColor(original_frame, cv2.COLOR_BGR2GRAY)
cv.imshow(winName, img_gray & frame)
cv.waitKey(0)


## Show the original image
winName = 'Dilate Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


## Get largest contour from contours
contours, hierarchy = cv2.findContours(frame, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)


## Get minimum area rectangle and corner points
rect = cv2.minAreaRect(max(contours, key = cv2.contourArea))
print(rect)
box = cv2.boxPoints(rect)
print(box)


## Sorted points by x and y
## Not used in this code
print(sorted(box , key=lambda k: [k[0], k[1]]))



## draw anchor points on corner
frame = original_frame.copy()
z = 6
for b in box:
    cv2.circle(frame, tuple(b), z, 255, -1)


## show original image with corners
box2 = np.int0(box)
cv2.drawContours(frame,[box2],0,(0,0,255), 2)
cv2.imshow('Detected Corners',frame)
cv2.waitKey(0)
cv2.destroyAllWindows()



## https://***.com/questions/11627362/how-to-straighten-a-rotated-rectangle-area-of-an-image-using-opencv-in-python
def subimage(image, center, theta, width, height):
   shape = ( image.shape[1], image.shape[0] ) # cv2.warpAffine expects shape in (length, height)

   matrix = cv2.getRotationMatrix2D( center=center, angle=theta, scale=1 )
   image = cv2.warpAffine( src=image, M=matrix, dsize=shape )

   x = int(center[0] - width / 2)
   y = int(center[1] - height / 2)

   image = image[ y:y+height, x:x+width ]

   return image



## Show the original image
winName = 'Dilate Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)


## use the calculated rectangle attributes to rotate and extract it
frame = subimage(original_frame, center=rect[0], theta=int(rect[2]), width=int(rect[1][0]), height=int(rect[1][1]))
original_frame = frame.copy()
cv.imshow(winName, frame)
cv.waitKey(0)

perspective_transformed_image = frame.copy()



## Apply median blur
frame = cv2.medianBlur(frame,11)


## Show the original image
winName = 'Median Blur'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


#kernel = np.ones((5,5),np.uint8)
#frame = cv2.dilate(frame,kernel,iterations = 1)



# Otsu's thresholding
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
ret2,thresh_n = cv.threshold(frame,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
frame = thresh_n


## Show the original image
winName = 'Otsu Thresholding'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)



## invert color
frame = cv2.bitwise_not(frame)

## Show the original image
winName = 'Invert Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


## Dilate image
kernel = np.ones((5,5),np.uint8)
frame = cv2.dilate(frame,kernel,iterations = 1)

##
## Show the original image
winName = 'SUB'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
img_gray = cv2.cvtColor(original_frame, cv2.COLOR_BGR2GRAY)
frame = img_gray & frame
frame[np.where(frame==0)] = 255
cv.imshow(winName, frame)
cv.waitKey(0)





hist,bins = np.histogram(frame.flatten(),256,[0,256])

cdf = hist.cumsum()
cdf_normalized = cdf * hist.max()/ cdf.max()
print(cdf)
print(cdf_normalized)
hist_image = frame.copy()




## two decresing range algorithm
low_index = -1
for i in range(0, 256):
   if cdf[i] > 0:
      low_index = i
      break
print(low_index)

tol = 0
tol_limit = 20
broken_index = -1
past_val = cdf[low_index] - cdf[low_index + 1]
for i in range(low_index + 1, 255):
   cur_val = cdf[i] - cdf[i+1]
   if tol > tol_limit:
      broken_index = i
      break
   if cur_val < past_val:
      tol += 1
   past_val = cur_val

print(broken_index)




##
lower = min(frame.flatten())
upper = max(frame.flatten())
print(min(frame.flatten()))
print(max(frame.flatten()))

#img_rgb_inrange = cv2.inRange(frame_HSV, np.array([lower,lower,lower]), np.array([upper,upper,upper]))
img_rgb_inrange = cv2.inRange(frame, (low_index), (broken_index))
neg_rgb_image = ~img_rgb_inrange
## Show the original image
winName = 'Final'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, neg_rgb_image)
cv.waitKey(0)


kernel = np.ones((3,3),np.uint8)
frame = cv2.erode(neg_rgb_image,kernel,iterations = 1)
winName = 'Final Dilate'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)


##
winName = 'Final Subtracted'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
img2 = np.zeros_like(perspective_transformed_image)
img2[:,:,0] = frame
img2[:,:,1] = frame
img2[:,:,2] = frame
frame = img2
cv.imshow(winName, perspective_transformed_image | frame)
cv.waitKey(0)


##
import matplotlib.pyplot as plt
plt.plot(cdf_normalized, color = 'b')
plt.hist(hist_image.flatten(),256,[0,256], color = 'r')
plt.xlim([0,256])
plt.legend(('cdf','histogram'), loc = 'upper left')
plt.show()

1.中值过滤器:

2。 OTSU 阈值:

3.反转:

4.倒置图像膨胀:

5.掩蔽提取:

6.转换的 ROI 点:

7.透视校正图像:

8.中值模糊:

9. OTSU 阈值:

10.倒置图像:

11. ROI 提取:

12.夹紧:

13.扩张:

14.最终投资回报率:

15.第 11 步图像的直方图:

【讨论】:

【参考方案3】:

没有尝试,但这可能有效。 第1步: 使用 ps 找出验证码字符的颜色。例如,“YabVzu”是 (128,128,128),

第 2 步: 使用枕头的方法getdata()/getcolor(),它会返回一个包含每个像素颜色的序列。

然后,我们将序列中的每个项目投影到原始验证码图像。

因此我们知道图像中每个像素的位置。

第 3 步: 找到颜色与 (128,128,128) 最接近的所有像素。 您可以设置一个阈值来控制准确性。此步骤返回另一个序列。 让我们将其注释为 Seq a

第 4 步: 生成与原始图片高度和宽度相同的图片。 将 [Seq a] 中的每个像素绘制在图片中非常精确的位置。在这里,我们将得到一个清洁的训练项目

第 5 步: 使用 Keras project 破解代码。精度应该在72%以上。

【讨论】:

顺便说一句,你能帮忙吗:***.com/questions/60157319/… 我很抱歉,但对 py2app 的了解不多。会尝试你的建议 项目建议的原因是,理论上处理后生成的图像应该与项目试图处理的验证码图像具有相似的模式

以上是关于使用 OpenCV 的图像处理从图像中删除背景文本和噪声的主要内容,如果未能解决你的问题,请参考以下文章

OpenCV修复方法不会从IC图像中删除文本

如何从 OpenCV 中的图像中删除空格?

如何从图像中裁剪或删除白色背景

从图像中删除黑色背景。

javascript 从网页中删除图像,视频和背景,仅保留文本内容

如何使用python opencv删除文件夹中的特定图像