Python：在使用 python tesseract API 接口时遇到 OCR 问题

Posted 2023-04-17

技术标签:

【中文标题】Python：在使用 python tesseract API 接口时遇到 OCR 问题【英文标题】：Python : Geting issue on OCR while using python tesseract API interface 【发布时间】：2019-11-24 00:00:48 【问题描述】：

我使用 Pytesseract 模块进行 OCR。这似乎是一个缓慢的过程。所以我跟着 Pytesseract is too slow. How can I make it process images faster?.

我使用了https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/xvTFjYCDRQU/rCEwjZL3BQAJ 中提到的代码。但是出现错误 !strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 201 Segmentation fault (core dumped), 然后我检查了一些帖子并获取参考以添加我的代码locale.setlocale(locale.LC_ALL, "C")。

所以在我的代码中添加了这个之后，我又遇到了一个错误

Traceback (most recent call last):
  File "master_doc_test3.py", line 107, in <module>
    tess = Tesseract()
  File "master_doc_test3.py", line 67, in __init__
    if self._lib.TessBaseAPIInit3(self._api, datapath, language):
ctypes.ArgumentError: argument 3: <class 'TypeError'>: wrong type`

谁能给出这个错误的想法？或者如果有人知道使用 python 以最快的方式制作 OCR 的最佳方法。

【问题讨论】：

【参考方案1】：

您应该尝试将传递给 ctypes 库调用的每个参数转换为字节：

self._lib.TessBaseAPIInit3(self._api, datapath, language)

这样的事情对我有用：

self._lib.TessBaseAPIInit3(self._api, bytes(datapath, encoding='utf-8'), bytes(language, encoding='utf-8'))

我得到了线索here。请考虑到您正在使用的代码需要在其他 lib 调用中进行更多微调：

tess.set_variable(bytes("tessedit_pageseg_mode", encoding='utf-8'), bytes(str(frame_piece.psm), encoding='utf-8'))
tess.set_variable(bytes("preserve_interword_spaces", encoding='utf-8'), bytes(str(1), encoding='utf-8'))

【讨论】：

以上是关于Python：在使用 python tesseract API 接口时遇到 OCR 问题的主要内容，如果未能解决你的问题，请参考以下文章