文本识别和重构OCR opencv

Posted 2023-04-17

技术标签:

【中文标题】文本识别和重构OCR opencv【英文标题】：text recognition and restructuring OCR opencv 【发布时间】：2021-08-18 04:14:18 【问题描述】：

链接到原始图像 https://ibb.co/0VC6vkX

我目前正在处理一个 OCR 项目。我对图像进行了预处理，然后应用预训练的 EAST 模型进行文本检测。

import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline

img=cv2.imread('bw_image.jpg')
model=cv2.dnn.readNet('frozen_east_text_detection.pb')

#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)

h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)

#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)

#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())

#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
    for j in range(0,geometry.shape[3]):
    
        if scores[0][0][i][j]<0.1:
            continue

        bottom_x=int(j*4 + geometry[0][1][i][j])
        bottom_y=int(i*4 + geometry[0][2][i][j])

        top_x=int(j*4 - geometry[0][3][i][j])
        top_y=int(i*4 - geometry[0][0][i][j])

        rectangles.append((top_x,top_y,bottom_x,bottom_y))
        confidence_score.append(float(scores[0][0][i][j]))

#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)

#finally to display these text boxes let's iterate over them and convert them to the original shape 
#using the ratio we calculated earlier
img_copy=img.copy()

for (x1,y1,x2,y2) in final_boxes:
    
    x1=int(x1*w_ratio)
    y1=int(y1*h_ratio)
    x2=int(x2*w_ratio)
    y2=int(y2*h_ratio)
    
    #to draw the rectangles on the image use cv2.rectangle function
    cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)

这为我们提供了检测到的文本，如下所示：

现在对于文本识别，我使用了预训练的 opencv CRNN 模型，如下所示：

# Download the CRNN model and Load it
model1 = cv2.dnn.readNet('D:/downloads/crnn.onnx')


# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

blob = cv2.dnn.blobFromImage(img_gray, scalefactor=1/127.5, size=(100,32), mean=127.5)


# Pass the image to network and extract per-timestep scores
model1.setInput(blob)

scores = model1.forward()
print(scores.shape)

alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'

char_set = blank + alphabet_set


# Decode the scores to text
def most_likely(scores, char_set):
    text = ""
    for i in range(scores.shape[0]):
        c = np.argmax(scores[i][0])
        text += char_set[c]
    return text


def map_rule(text):
    char_list = []
    for i in range(len(text)):
        if i == 0:
            if text[i] != '-':
                char_list.append(text[i])
        else:
            if text[i] != '-' and (not (text[i] == text[i - 1])):
                char_list.append(text[i])
    return ''.join(char_list)


def best_path(scores, char_set):
    text = most_likely(scores, char_set)
    final_text = map_rule(text)
    return final_text


out = best_path(scores, char_set)
print(out)

但是在图像上应用这个模型会得到以下输出：

saetan

我真的不明白。任何人都可以指导文本识别有什么问题。预训练的 CRNN 模型有问题吗？此外，我还想在文本被识别后对其进行重组，以它们在原始图像中的结构方式。识别问题解决后，我们有了边界框坐标和识别文本，那么我们如何准确地重构文本呢？任何帮助将不胜感激。

编辑：我使用了 pytesseract image_to_string() 和 image_to_data() 函数，但它们的性能并不好。是否有任何其他我可以使用的预训练文本识别模型，以便如果这个 CRNN 模型不够合适，我可以复制我的 EAST Text Detection 模型的成功。这样我就可以在通过 EAST 模型获得的coordinates(bounding boxes) 的帮助下准确地重构图像中的文本。

【问题讨论】：

是否必须使用 EAST 模型 (crnn.onnx) 进行文本检测？如果没有，我建议使用pytesseract。特别是 image_to_string() 和 image_to_data() 函数。我已经尝试过pytesseract 和您提到的功能，但它们并没有给出准确的结果。也许这是由于 pytesseract 不能很好地处理低质量图像的原因。我更喜欢 EAST 模型和 CRNN 预训练模型，因为如图所示，EAST 模型非常准确地检测到文本，我也想在文本识别中复制这种成功。您是在整个图像上还是在单个边界框上应用pytesseract？我通常只在检索边界框后才使用pytesseract，但性能通常非常好。您的图像质量不够差，pytesseract 无法通过 IMO。也许尝试一些more preprocessing。您的边界框也确实受到限制，字母需要完全在框内才能使 OCR 工作。也许尝试添加一些填充像素。您需要提供CRNN（或pytesseract）检测到的边界框（我怀疑是一次一个）而不是整个图像。您需要遍历所有框。我认为改写我的问题的更好方法是如何遍历通过 East Model 检测到的边界框并按照@Shai 的建议识别这些边界框中的文本 【参考方案1】：

处理作物真的很简单，只需稍微改变一下你的最后一个循环：

import pytesseract
from PIL import Image

...
 
for x1,y1,x2,y2 in final_boxes:
        
    #to draw the rectangles on the image use cv2.rectangle function
    # cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)
    img_crop = Image.fromarray(img[y1-1: y2+1, x1-1:x2+1])
    text = pytesseract.image_to_string(img_crop, config='--psm 8').strip()
    cv2.putText(img_copy, text, (x1,y1), 0, .7, (0, 0, 255), 2 )

【讨论】：

这确实解决了我的问题，而无需使用预训练的 crnn 模型。您能否指导我在一些 word 或 pdf 文件上重新构造这个识别的文本，就像上面收据中的结构一样。你可以看到我的另一个答案，正好处理preserving indentation的问题。垂直和从左到右对框进行排序，然后应用代码。 @Junaid，您接受的答案是我写的作为对我的第一个解决方案的评论。在此之后您清楚地写道，您需要 CRNN 的解决方案，这就是我提供给您的第二个答案。【参考方案2】：

这是一个可能的解决方案，您可以通过尝试一些方法来改进：

通过改变高斯参数通过对模糊图像进行阈值处理来查看它是否可以改善结果

代码：

gray = cv2.imread('/path/to/your_image.jpeg', cv2.IMREAD_GRAYSCALE)

g = cv2.GaussianBlur(gray, (3, 3), .5)

config = "-l eng --oem 1 --psm 6"
text = pytesseract.image_to_string(g, config=config)
print(text)

结果文本（部分）：

400242 | 6161108006012 BIO WHOLE MILK 1LTR \ 1PCS 12.Cu PCS 430.50 1,566.00
400365 | 6161108000119 BIO YOG VANILLA 150ML CUP ! 1PCS 24.05 PCS 91.02 2,184.36
400545 | 6161108000584 BIO LONG LIFE COOKING CREAM SOOML 1 1PCS \ 12.Gu PCS 241.32 2,895.78
74 - i :
400821 | 6161108005060" | BIO YOGHURT STRAWBERRY 450ML | 1Pcs 6.50 PCS 266.37 1,598.23
400822 , 6161108005207 BIO YOGHURT VANILLA 90ML ; 1PCS ! 36.0b FCS 60.96 2,194.38
450466 | 6166000051801 KENTASTE COCONUT MILK 400ML ; 1CTN * 12 PCS | 2.00 TN 1,920.96 3,841.92
, 450469 | 6166000051818 KENTASTE COCONUT CREAM 400ML : 1CTN* 12 PCS | 2.0. CTN 2,213.28 4,426.56
450465 | 6166000051887 KENTASTE COCONUT OIL 700ML 1CTN * 12 PCS | Iso) STN) 7,697.76 7,697.76
400985 | 6161108000812 BIO WHOLE MILK LONG LIFE SOOML EPCS: 12.00 PCS | 67.40 808.79

【讨论】：

我使用了 pytesseract，与您的相比，它在不使用高斯模糊的情况下返回了相当不错的结果。我认为重新表述我的问题的更好方法是如何遍历通过 East Model 检测到的边界框并识别这些边界框中的文本。你不能在你的 for 循环中对边界框图像运行 pytesserect 吗？这种方法有什么问题？在我的测试中，稍微模糊图像给了我更好的结果。无论如何，稍微预处理你的测试图像肯定会改善结果。 @Junaid，您接受的答案是我在这里写的作为解决方案的内容（请参阅我的第二条评论）。在这之后你清楚地写道，你需要一个 CRNN 的解决方案，这就是我提供给你的。【参考方案3】：

经过仔细检查，我发现您的代码有很多问题。如果我了解您希望如何运行 crnn 识别器，那么以下解决方案可能会解决您的问题。

你的函数定义：

# Decode the scores to text
def most_likely(scores, char_set):
    text = ""
    for i in range(scores.shape[0]):
        c = np.argmax(scores[i][0])
        text += char_set[c]
    return text


def map_rule(text):
    char_list = []
    for i in range(len(text)):
        if i == 0:
            if text[i] != '-':
                char_list.append(text[i])
        else:
            if text[i] != '-' and (not (text[i] == text[i - 1])):
                char_list.append(text[i])
    return ''.join(char_list)


def best_path(scores, char_set):
    text = most_likely(scores, char_set)
    final_text = map_rule(text)
    return final_text

以下大部分代码是您已有的，比较变化：

import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline

img=cv2.imread('/path/to/your_imge.jpeg')
model=cv2.dnn.readNet('/path/to/frozen_east_text_detection.pb')

#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)

h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)

#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)

#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())

#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
    for j in range(0,geometry.shape[3]):
    
        if scores[0][0][i][j]<0.1:
            continue

        bottom_x=int(j*4 + geometry[0][1][i][j])
        bottom_y=int(i*4 + geometry[0][2][i][j])

        top_x=int(j*4 - geometry[0][3][i][j])
        top_y=int(i*4 - geometry[0][0][i][j])

        rectangles.append((top_x,top_y,bottom_x,bottom_y))
        confidence_score.append(float(scores[0][0][i][j]))

#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)

model1 = cv2.dnn.readNet('/path/to/crnn.onnx')


# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'

char_set = blank + alphabet_set

for (x1,y1,x2,y2) in final_boxes:
    
    x1=int(x1*w_ratio)
    y1=int(y1*h_ratio)
    x2=int(x2*w_ratio)
    y2=int(y2*h_ratio)
    
    # Work with detected text boxes for recognition
    blob = cv2.dnn.blobFromImage(img_gray[y1:y2, x1:x2], scalefactor=1/127.5, size=(100,32), mean=127.5)

    # Pass the image to network and extract per-timestep scores
    model1.setInput(blob)

    scores = model1.forward()
    print(scores.shape)

    out = best_path(scores, char_set)
    print(out)

您可能必须将已识别的文本覆盖在图像顶部左右以检查准确性。有更好的方法来进行测试，但这超出了这个问题的范围。

【讨论】：

@Junaid，代码是否按您想要的方式工作？我确信准确性可能是一个问题，但它至少应该遍历所有检测到的文本框并打印识别的字符。是的，它确实有效，但会逐个显示已识别的字符，这不是错误的方法，但我一直在寻找更强大的方法。是的，您在其他答案的评论中给出了答案，但我不知道该怎么做。无论如何，谢谢。

以上是关于文本识别和重构OCR opencv的主要内容，如果未能解决你的问题，请参考以下文章