使用 Python 将 PDF 转换为图像

Posted 2023-03-28

技术标签:

【中文标题】使用 Python 将 PDF 转换为图像【英文标题】：Convert PDF to Image using Python 【发布时间】：2020-06-27 06:54:04 【问题描述】：

我正在尝试在我安装的 ubuntu 服务器中将 pdf 文件转换为图像文件：

python2.7 poppler-utils pdf2image==1.12.1

我的代码：

from pdf2image import convert_from_path, convert_from_bytes

images = convert_from_path("/home/user/pdf_file.pdf")

# OR

with open("/home/user/pdf_file.pdf") as pdf:
    images = convert_from_bytes(pdf.read())

输出

当我使用函数“convert_from_path”时

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
    thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator

当我使用“convert_from_bytes”函数时

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 268, in convert_from_bytes
    paths_only=paths_only,
  File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
    thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator

我已经重新安装了所有实用程序，然后我面临这些问题。

【问题讨论】：

来自 pip pypi.org/project/pdf2image ，似乎不支持 Python 2.7。它清楚地说明了一个 python (3.5+) 模块，它包装了 pdftoppm 和 pdftocairo 以将 PDF 转换为 1.12.1 版的 PIL Image 对象 【参考方案1】：

我在python2中也失败了，但是在python3中成功了。

另一个库也发生了同样的问题： TypeError: 'threadsafe_iter' object is not an iterator

正如他们所说，这是由 next() 函数引起的 python 2 vs 3 问题。如果在文件/home/***/.local/lib/python2.7/site-packages/pdf2image/generators.py中修改__next__() -> next()，则在py2中运行成功。

顺便说一句，我向 pdf2image 团队创建了一个新问题。TypeError: ThreadSafeGenerator object is not an iterator #133

附加 pdf2image 自述文件说这是一个 python (3.5+) 模块。 pdf2image v1.7.1 在 py27 上工作。试试pip install pdf2image==1.7.1

【讨论】：

【参考方案2】：

如果你想把PDF转成图片可以试试Python Ghostscript package:

pip install ghostscript

import ghostscript
import locale

def pdf2jpeg(pdf_input_path, jpeg_output_path):
    args = ["pef2jpeg", # actual value doesn't matter
            "-dNOPAUSE",
            "-sDEVICE=jpeg",
            "-r144",
            "-sOutputFile=" + jpeg_output_path,
            pdf_input_path]

    encoding = locale.getpreferredencoding()
    args = [a.encode(encoding) for a in args]

    ghostscript.Ghostscript(*args)

pdf2jpeg(
    "...Fixate/ActiveState/pdf/a.pdf",
    "...Fixate/ActiveState/pdf/a.jpeg",
)

【讨论】：

感谢您的回答。我已经尝试过您的代码，并且运行良好。我很乐意接受这个作为答案，但这不是我问题的答案，如果有人遇到同样的问题会感到困惑，实际上我也需要解决这个问题。谢谢！ @RokiDGupta 好的！我明白了

以上是关于使用 Python 将 PDF 转换为图像的主要内容，如果未能解决你的问题，请参考以下文章

Google App Engine（即“纯 Python”）：将 PDF 转换为图像

将pdf转换为图像但放大后

将 Gmail 转换为 PDF：HTML 中的嵌入图像

在python中将png图像转换为一个pdf

使用 iTextSharp 将图像转换为 PDF 保留剪切路径

使用 GhostScript 将图像转换为 PDF