Tesseract,openCV,python:如何获取句子或同一行文本的边界框?

Posted

技术标签:

【中文标题】Tesseract,openCV,python:如何获取句子或同一行文本的边界框?【英文标题】:Tesseract, openCV, python: how to get bounding box for a sentence or same line of text? 【发布时间】:2021-12-05 09:27:10 【问题描述】:

我想对图像进行一些文本识别。我可以识别文本和相应的边界框,但只能逐字识别,我想在同一行文本上做同样的事情。在下面的代码中,我注意到当我显示边界框坐标时,当单词在同一行时,b['top'] 的值是相似的。我不知道我是否可以使用它,但我希望每行文本和相关句子都有一个边界框。

在我制作的代码下方:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import cv2 
import pytesseract
from pytesseract import Output

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

img = cv2.imread('./images/page_2.jpg') # load img

img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)  #transform colored img to grayscale

plt.imshow(img)

boxes = pytesseract.image_to_data(img, output_type=Output.DICT) #transform image to dict

boxes = pd.DataFrame(boxes) #dict to dataframe
boxes['text'].replace('', np.nan, inplace=True) #replace empty values by NaN
boxes= boxes.dropna(subset = ['text']) #delete rows with NaN 

print(boxes)

for index, b in boxes.iterrows():
    (x,y,w,h) = b['left'],b['top'],b['width'],b['height']
    print((x,y,w,h), b['text'])
    cv2.rectangle(img,(x,y),(w+x,h+y), (0,0,255),1)
    
cv2.imshow('result',img)
cv2.waitKey(0)

“盒子”字典的输出:

     level  page_num  block_num  par_num  line_num  word_num  left  top  \
4        5         1          1        1         1         1    32   24   
5        5         1          1        1         1         2   100   24   
6        5         1          1        1         1         3   191   28   
7        5         1          1        1         1         4   227   28   
8        5         1          1        1         1         5   257   24   
..     ...       ...        ...      ...       ...       ...   ...  ...   
154      5         1          1       11         1         7   261  457   
155      5         1          1       11         1         8   320  461   
156      5         1          1       11         1         9   351  457   
157      5         1          1       11         1        10   376  457   
158      5         1          1       11         1        11   468  457   

     width  height       conf       text  
4       60      17  93.283920     Maitre  
5       82      19  93.204414   corbeau,  
6       29      13  96.932060        sur  
7       22      12  96.932060         un  
8       50      17  93.306122      arbre  
..     ...     ...        ...        ...  
154     51      21  79.999794      qu'on  
155     23      13  90.411606         ne  
156     18      21  21.623993        I'y  
157     85      21  90.583260  prendrait  
158     44      21  96.933327      plus.

(x,y,w,h) 和 b['text'] 的输出(带有文本的边界框):

(32, 24, 60, 17) Maitre
(100, 24, 82, 19) corbeau,
(191, 28, 29, 13) sur
(227, 28, 22, 12) un
(257, 24, 50, 17) arbre
(315, 24, 70, 21) perché,
(79, 49, 58, 17) Tenait
(144, 53, 23, 13) en
(174, 53, 34, 13) son
(216, 50, 33, 16) bec
(257, 53, 22, 13) un
(287, 49, 84, 22) fromage.
(32, 75, 60, 17) Maitre
(100, 75, 61, 17) renard
(169, 79, 31, 17) par
(206, 75, 64, 17) I'odeur
(277, 75, 68, 17) alléché
(353, 88, 3, 6) ,
(81, 101, 27, 16) Lui
(115, 101, 28, 16) tint
(151, 100, 11, 17) 4
(169, 104, 34, 17) peu
(211, 100, 42, 21) prés
(260, 104, 21, 13) ce
(289, 101, 76, 20) langage
(374, 105, 3, 12) :
(81, 126, 31, 16) «Et
(119, 126, 72, 21) bonjour
(199, 126, 88, 17) Monsieur
(294, 126, 22, 16) du
(324, 125, 87, 18) Corbeau.
(31, 151, 40, 17) Que
(78, 155, 46, 13) vous
(131, 151, 40, 17) 6tes
(177, 151, 32, 21) joli!
(217, 155, 35, 17) que
(260, 155, 44, 13) vous
(312, 155, 29, 13) me
(348, 151, 80, 17) semblez
(436, 151, 52, 17) beau!
(81, 176, 47, 18) Sans
(136, 177, 63, 19) mentir,
(207, 177, 15, 17) si
(229, 178, 48, 16) votre
(284, 181, 72, 17) ramage
(81, 202, 25, 17) Se
(114, 204, 79, 19) rapporte
(200, 202, 11, 17) a
(218, 204, 48, 15) votre
(273, 203, 87, 20) plumage,
(31, 228, 48, 17) Vous
(86, 227, 40, 18) étes
(134, 228, 15, 16) le
(157, 227, 63, 21) phénix
(227, 228, 34, 17) des
(269, 227, 51, 18) hétes
(327, 228, 23, 16) de
(358, 232, 33, 13) ces
(398, 228, 49, 17) bois»
(31, 253, 53, 17) Aces
(92, 255, 45, 15) mots
(145, 253, 15, 17) le
(167, 253, 78, 17) corbeau
(253, 257, 22, 13) ne
(283, 257, 22, 13) se
(312, 255, 40, 15) sent
(360, 257, 33, 17) pas
(400, 253, 23, 17) de
(429, 253, 40, 21) joie;
(81, 279, 19, 16) Et
(107, 283, 43, 16) pour
(157, 280, 74, 16) montrer
(238, 283, 22, 13) sa
(267, 279, 45, 16) belle
(319, 279, 43, 19) voix,
(33, 304, 8, 16) ll
(49, 308, 53, 13) ouvre
(110, 308, 22, 13) un
(140, 304, 47, 21) large
(195, 304, 33, 17) bec
(236, 304, 54, 17) laisse
(297, 305, 67, 16) tomber
(371, 308, 22, 13) sa
(400, 304, 53, 21) proie.
(32, 330, 23, 17) Le
(63, 330, 60, 16) renard
(131, 330, 38, 17) s'en
(177, 330, 48, 17) saisit
(232, 331, 17, 15) et
(256, 330, 28, 16) dit:
(291, 330, 49, 16) "Mon
(348, 330, 35, 16) bon
(391, 330, 92, 19) Monsieur,
(103, 355, 92, 21) Apprenez
(202, 359, 36, 17) que
(245, 356, 35, 16) tout
(287, 355, 67, 17) flatteur
(31, 381, 25, 16) Vit
(63, 385, 34, 12) aux
(104, 381, 71, 20) dépens
(181, 381, 24, 16) de
(212, 381, 43, 16) celui
(262, 381, 28, 20) qui
(298, 380, 79, 17) l'écoute:
(32, 406, 50, 17) Cette
(90, 406, 50, 21) lecon
(148, 407, 40, 16) vaut
(195, 406, 40, 17) bien
(243, 410, 22, 13) un
(273, 406, 79, 21) fromage
(359, 410, 45, 13) sans
(411, 406, 67, 17) doute."
(81, 432, 22, 16) Le
(110, 432, 77, 16) corbeau
(195, 432, 76, 16) honteux
(279, 433, 17, 15) et
(303, 432, 63, 16) confus
(31, 457, 42, 17) Jura
(81, 457, 44, 17) mais
(133, 461, 22, 13) un
(163, 461, 34, 17) peu
(205, 457, 36, 17) tard
(250, 470, 3, 6) ,
(261, 457, 51, 21) qu'on
(320, 461, 23, 13) ne
(351, 457, 18, 21) I'y
(376, 457, 85, 21) prendrait
(468, 457, 44, 21) plus.

图像结果:

result

【问题讨论】:

最好将文本输出作为文本而不是图像发布。 我不明白你的评论抱歉,代码在哪里? 我的意思不是代码,而是问题帖。文本图像(在本例中为 dict带有文本的框)会阻碍人们复制数据以寻找解决问题的方法。比贴文还要好,贴出boxes.to_dict()的输出。 完成,谢谢! 【参考方案1】:

我注意到当我显示我的边界框坐标时,当单词在同一行时,b['top'] 的值是相似的。我不知道我是否可以使用它,但我希望每行文本和相关句子都有一个边界框。

您完全可以使用它。这会通过聚合垂直重叠的框来生成线条:

def lineup(boxes):
    linebox = None
    for _, box in boxes.iterrows():
        if linebox is None: linebox = box           # first line begins
        elif box.top <= linebox.top+linebox.height: # box in same line
            linebox.top = min(linebox.top, box.top)
            linebox.width = box.left+box.width-linebox.left
            linebox.heigth = max(linebox.top+linebox.height, box.top+box.height)-linebox.top
            linebox.text += ' '+box.text
        else:                                       # box in new line
            yield linebox
            linebox = box                           # new line begins
    yield linebox                                   # return last line

lineboxes = pd.DataFrame.from_records(lineup(boxes))

【讨论】:

以上是关于Tesseract,openCV,python:如何获取句子或同一行文本的边界框?的主要内容,如果未能解决你的问题,请参考以下文章

Python下实现Tesseract OCR训练字符库(OpenCV-python边缘检测代替jTessBoxEditor手动矫正)

python 使用Python,OpenCV和Tesseract OCR引擎使用10行代码绕过Captcha

Python+OpenCV+Tesseract实现OCR字符识别

文本的检测识别实战:使用 Tesseract 进行 OpenCV OCR 和文本识别

PyCharm 在 Mac 上找不到 Tesseract

使用 OpenCV 和 Python 识别信用卡号