如何使用 python 更改 pdf 中的超链接？

Posted 2023-03-25

技术标签:

【中文标题】如何使用 python 更改 pdf 中的超链接？【英文标题】：how do i change hyperlinks inside pdf using python? 【发布时间】：2017-12-24 18:20:04 【问题描述】：

我目前正在使用 pyPDF2 打开并循环浏览页面。我如何实际扫描超链接，然后继续更改超链接？

【问题讨论】：

【参考方案1】：

所以我无法使用 pyPDF2 库获得您想要的内容。

然而，我确实得到了与另一个库合作的东西：pdfrw。这对我在 Python 3.6 中使用 pip 安装得很好：

pip install pdfrw

注意：以下我一直在使用this example pdf，我在网上找到了其中包含多个链接。您的里程可能会因此而有所不同。

import pdfrw

pdf = pdfrw.PdfReader("pdf.pdf")  # Load the pdf
new_pdf = pdfrw.PdfWriter()  # Create an empty pdf

for page in pdf.pages:  # Go through the pages

    # Links are in Annots, but some pages don't have links so Annots returns None
    for annot in page.Annots or []:

        old_url = annot.A.URI

        # >Here you put logic for replacing the URLs<
        
        # Use the PdfString object to do the encoding for us
        # Note the brackets around the URL here
        new_url = pdfrw.objects.pdfstring.PdfString("(http://www.google.com)")

        # Override the URL with ours
        annot.A.URI = new_url

    new_pdf.addpage(page)    

new_pdf.write("new.pdf")

【讨论】：

【参考方案2】：

我设法让它与 PyPDF2 一起工作。

如果您只想删除页面的所有注释，您只需要这样做：

if '/Annots' in page: del page['/Annots']

否则，这是您更改每个链接的方法：

import PyPDF2

new_link = "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # great video by the way

pdf_reader = PyPDF2.PdfFileReader("input.pdf")
pdf_writer = PyPDF2.PdfFileWriter()

for i in range(pdf_reader.getNumPages()):
    page = pdf_reader.getPage(i)
    
    if '/Annots' not in page: continue
    for annot in page['/Annots']:
        annot_obj = annot.getObject()
        if '/A' not in annot_obj: continue  # not a link
        # you have to wrap the key and value with a TextStringObject:
        key   = PyPDF2.generic.TextStringObject("/URI")
        value = PyPDF2.generic.TextStringObject(new_link)
        annot_obj['/A'][key] = value
    
    pdf_writer.addPage(page)

with open('output.pdf', 'wb') as f:
    pdf_writer.write(f)

给定页面索引i 和注释索引j 的等效单行将是：

pdf_reader.getPage(i)['/Annots'][j].getObject()['/A'][PyPDF2.generic.TextStringObject("/URI")] = PyPDF2.generic.TextStringObject(new_link)

【讨论】：

以上是关于如何使用 python 更改 pdf 中的超链接？的主要内容，如果未能解决你的问题，请参考以下文章

如何使用PDF编辑器的超链接工具？

Java 更新和删除PDF中的超链接

如何修改PPT中的超链接字体颜色

更改 Ionic Toast 中的超链接颜色

如何通过 IE11 解决 selenium python 中嵌套 HTML 中的超链接？