pytesseract 给出错误 PermissionError: [WinError 5] Access is denied

Posted

技术标签:

【中文标题】pytesseract 给出错误 PermissionError: [WinError 5] Access is denied【英文标题】:pytesseract gives an error PermissionError: [WinError 5] Access is denied 【发布时间】:2021-01-10 21:17:11 【问题描述】:

我在 Python 中使用 pytesseract 来获取 pdf。但我在 Windows 10 中遇到权限错误。 我已经从https://github.com/UB-Mannheim/tesseract/wiki 安装了 tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe 我也有 poppler-20.09.0 文件。我正在使用 python 3.8.0

import pdf2image
import PyPDF2
import os
try:
   from PIL import Image
except ImportError:
   import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR'

def pdf_to_img(pdf_file):
    print('pdf_file = ', pdf_file)
    return pdf2image.convert_from_path(pdf_file, dpi=200, fmt='jpg',
                                   poppler_path=r'F:\lokesh\resume_script\poppler-20.09.0\bin')


def ocr_core(file):
    text = pytesseract.image_to_string(file,)
    return text


def print_pages(pdf_file):
    images = pdf_to_img(pdf_file)
    for pg, img in enumerate(images):
        print(ocr_core(img))


print_pages("aa.pdf")

当我运行这段代码时。它给出了这个错误。

Traceback (most recent call last):
File "test.py", line 84, in <module>
  print_pages("aa.pdf")
File "test.py", line 81, in print_pages
  print(ocr_core(img))
File "test.py", line 74, in ocr_core
  text = pytesseract.image_to_string(file,)
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 344, in image_to_string
  return 
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 347, in <lambda>
  Output.STRING: lambda: run_and_get_output(*args),
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 258, in run_and_get_output
  run_tesseract(**kwargs)
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 229, in run_tesseract
  raise e
File "F:\python\lib\site-packages\pytesseract\pytesseract.py", line 226, in run_tesseract
  proc = subprocess.Popen(cmd_args, **subprocess_args())
File "F:\python\lib\subprocess.py", line 854, in __init__
  self._execute_child(args, executable, preexec_fn, close_fds,
File "F:\python\lib\subprocess.py", line 1307, in _execute_child
  hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

我们如何在 windows 中解决这个错误

【问题讨论】:

您的tesseract_cmd 似乎太短了。缺少二进制文件? 当我添加 pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' 然后它给出一个错误 pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:/Program Files (x86)/Tesseract-OCR/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.') 你还有一个错误。 如何解决这个错误 【参考方案1】:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR'

需要

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

【讨论】:

以上是关于pytesseract 给出错误 PermissionError: [WinError 5] Access is denied的主要内容,如果未能解决你的问题,请参考以下文章

ERROR_ALREADY_REQUESTING_PERMISSIONS 颤动

INSTALL_FAILED_DUPLICATE_PERMISSION 与三星 Galaxy S8

pytesseract OCR python错误-示例代码

pytesseract,WindowsError: [错误2] 系统找不到指定的文件

使用自定义字体的 Pytesseract 错误地对数字进行分类

Pytesseract:“TesseractNotFound 错误:tesseract 未安装或不在您的路径中”,我该如何解决?