如何旋转图像以对齐文本以进行提取?
Posted
技术标签:
【中文标题】如何旋转图像以对齐文本以进行提取?【英文标题】:How to rotate an image to align the text for extraction? 【发布时间】:2021-05-19 00:59:19 【问题描述】:我正在使用 pytessearct 从图像中提取文本。但它不适用于倾斜的图像。考虑下面给出的图像:
这是提取文本的代码,它在不倾斜的图像上运行良好。
img = cv2.imread(<path_to_image>)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5),0)
ret3, thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
def findSignificantContours (img, edgeImg):
contours, heirarchy = cv2.findContours(edgeImg, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
# Find level 1 contours
level1 = []
for i, tupl in enumerate(heirarchy[0]):
# Each array is in format (Next, Prev, First child, Parent)
# Filter the ones without parent
if tupl[3] == -1:
tupl = np.insert(tupl, 0, [i])
level1.append(tupl)
significant = []
tooSmall = edgeImg.size * 5 / 100 # If contour isn't covering 5% of total area of image then it probably is too small
for tupl in level1:
contour = contours[tupl[0]];
area = cv2.contourArea(contour)
if area > tooSmall:
significant.append([contour, area])
# Draw the contour on the original image
cv2.drawContours(img, [contour], 0, (0,255,0),2, cv2.LINE_AA, maxLevel=1)
significant.sort(key=lambda x: x[1])
#print ([x[1] for x in significant]);
mx = (0,0,0,0) # biggest bounding box so far
mx_area = 0
for cont in contours:
x,y,w,h = cv2.boundingRect(cont)
area = w*h
if area > mx_area:
mx = x,y,w,h
mx_area = area
x,y,w,h = mx
# Output to files
roi = img[y:y+h,x:x+w]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5),0)
ret3, thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
cv2_imshow(thresh)
text = pytesseract.image_to_string(roi);
print(text); print("\n"); print(pytesseract.image_to_string(thresh));
print("\n")
return [x[0] for x in significant];
edgeImg_8u = np.asarray(thresh, np.uint8)
# Find contours
significant = findSignificantContours(img, edgeImg_8u)
mask = thresh.copy()
mask[mask > 0] = 0
cv2.fillPoly(mask, significant, 255)
# Invert mask
mask = np.logical_not(mask)
#Finally remove the background
img[mask] = 0;
Tesseract 无法从此图像中提取文本。有没有办法可以旋转它以完美对齐文本,然后将其提供给 pytesseract?如果我的问题需要更明确的说明,请告诉我。
【问题讨论】:
可以得到卡片的旋转矩形和角度。使用角度拉直旋转并将该图像输入 Tesseract。 是的,我找到了矩形和计数数组。如何从中形成矩形?如果你能给我一些建议,那就太好了。 @eldesgraciado 不是你想要的轮廓边界矩形。您需要使用 cv2.minAreaRect() 找到旋转的矩形,从中可以获得中心、宽度和高度以及旋转。您可以使用旋转来纠正您的图像。请参阅docs.opencv.org/4.1.1/d3/dc0/… 的文档。另见pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python 【参考方案1】:这是一个简单的方法:
获取二值图像。 Load image,转换为grayscale, Gaussian blur,然后是Otsu's threshold。
查找轮廓并排序最大轮廓。我们find contours 然后用cv2.contourArea()
使用轮廓区域进行过滤以隔离矩形轮廓。
执行透视变换。接下来我们使用cv2.contourArea()
进行轮廓逼近以获得矩形轮廓。最后我们利用imutils.perspective.four_point_transform
来实际获取图片的鸟瞰图。
二值图像
结果
要实际提取文本,请看一下
Use pytesseract OCR to recognize text from an image
Cleaning image for OCR
Detect text area in an image using python and opencv
代码
from imutils.perspective import four_point_transform
import cv2
import numpy
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread("1.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Find contours and sort for largest contour
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
displayCnt = None
for c in cnts:
# Perform contour approximation
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02 * peri, True)
if len(approx) == 4:
displayCnt = approx
break
# Obtain birds' eye view of image
warped = four_point_transform(image, displayCnt.reshape(4, 2))
cv2.imshow("thresh", thresh)
cv2.imshow("warped", warped)
cv2.waitKey()
【讨论】:
【参考方案2】:要解决这个问题,您还可以在 opencv 中使用 minAreaRect api,它会为您提供一个具有旋转角度的最小面积旋转矩形。然后,您可以获得旋转矩阵并为图像应用 warpAffine 以拉直它。我还附上了一个 colab 笔记本,你可以在上面玩。
Colab 笔记本:https://colab.research.google.com/drive/1SKxrWJBOHhGjEgbR2ALKxl-dD1sXIf4h?usp=sharing
import cv2
from google.colab.patches import cv2_imshow
import numpy as np
def rotate_image(image, angle):
image_center = tuple(np.array(image.shape[1::-1]) / 2)
rot_mat = cv2.getRotationMatrix2D(image_center, angle, 1.0)
result = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR)
return result
img = cv2.imread("/content/sxJzw.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
mask = np.zeros((img.shape[0], img.shape[1]))
blur = cv2.GaussianBlur(gray, (5,5),0)
ret, thresh = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
cv2_imshow(thresh)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
largest_countour = max(contours, key = cv2.contourArea)
binary_mask = cv2.drawContours(mask, [largest_countour], 0, 1, -1)
new_img = img * np.dstack((binary_mask, binary_mask, binary_mask))
minRect = cv2.minAreaRect(largest_countour)
rotate_angle = minRect[-1] if minRect[-1] < 0 else -minRect[-1]
new_img = rotate_image(new_img, rotate_angle)
cv2_imshow(new_img)
【讨论】:
以上是关于如何旋转图像以对齐文本以进行提取?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 XCode 11 中更改图像的位置以进行纵向或水平对齐