用于 OCR 的 OpenCv pytesseract

Posted 2023-04-17

技术标签:

【中文标题】用于 OCR 的 OpenCv pytesseract【英文标题】：OpenCv pytesseract for OCR 【发布时间】：2016-11-04 16:54:40 【问题描述】：

如何使用opencv和pytesseract从图片中提取文字？

import cv2

导入 pytesseract 从 PIL 导入图像将 numpy 导入为 np from matplotlib import pyplot as plt

img = Image.open('test.jpg').convert('L')
img.show()
img.save('test','png')
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
#contour = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#print pytesseract.image_to_string(Image.open(edges))
print pytesseract.image_to_string(edges)

但这会出错-

Traceback（最近一次调用最后一次）：文件“open.py”，第 14 行，在打印 pytesseract.image_to_string(edges) 文件“/home/sroy8091/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py”，第 143 行，位于 image_to_string 如果 len(image.split()) == 4： AttributeError: 'NoneType' 对象没有属性 'split'

【问题讨论】：

【参考方案1】：

如果你喜欢使用 opencv 做一些预处理（比如你做了一些边缘检测），然后如果你想提取文本，你可以使用这个命令，

# All the imports and other stuffs goes here
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
img_new = Image.fromarray(edges)
text = pytesseract.image_to_string(img_new, lang='eng')
print (text)

【讨论】：

【参考方案2】：

您不能直接将 Opencv 对象与 tesseract 方法一起使用。

试试：

from PIL import Image
from pytesseract import *

image_file = 'test.png'
print(pytesseract.image_to_string(Image.open(image_file)))

【讨论】：

以上是关于用于 OCR 的 OpenCv pytesseract的主要内容，如果未能解决你的问题，请参考以下文章

(OpenCV/Python)实现OCR银行票据

如何使用 openCV 或 OCR tesseract 从图像中提取文本？ [复制]

通过字段识别和光学字符识别（OCR）进行数据输入自动化，用于预定义表格上的手写

使用 OpenCV 检测表

OpenCV 自适应阈值 OCR

Python + OpenCV：OCR 图像分割