合并两个（或更多）PDF

Posted 2023-02-24

技术标签:

【中文标题】合并两个（或更多）PDF【英文标题】：Combine two (or more) PDF's 【发布时间】：2010-10-22 23:03:45 【问题描述】：

背景：我需要为我的销售人员提供每周报告包。这个包包含几个（5-10）个水晶报表。

问题： 我想允许用户运行所有报告，也只运行一个报告。我想我可以通过创建报告然后执行以下操作来做到这一点：

List<ReportClass> reports = new List<ReportClass>();
reports.Add(new WeeklyReport1());
reports.Add(new WeeklyReport2());
reports.Add(new WeeklyReport3());
<snip>

foreach (ReportClass report in reports)

    report.ExportToDisk(ExportFormatType.PortableDocFormat, @"c:\reports\" + report.ResourceName + ".pdf");

这将为我提供一个装满报告的文件夹，但我想通过电子邮件向每个人发送一份包含所有每周报告的 PDF。所以我需要把它们结合起来。

有没有一种简单的方法可以在不安装任何第三方控件的情况下做到这一点？我已经有了 DevExpress 和 CrystalReports，我不想再添加太多。

最好将它们组合在 foreach 循环中还是单独的循环中？（或其他方式）

【问题讨论】：

看来我需要一些第三方库，谢谢大家的帮助。 【参考方案1】：

我不得不解决一个类似的问题，最终我创建了一个小型 pdfmerge 实用程序，该实用程序使用基本上是 MIT 许可的 PDFSharp 项目。

代码非常简单，我需要一个 cmdline 实用程序，因此与 PDF 合并相比，我有更多用于解析参数的代码：

using (PdfDocument one = PdfReader.Open("file1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("file2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
                
    CopyPages(one, outPdf);
    CopyPages(two, outPdf);

    outPdf.Save("file1and2.pdf");


void CopyPages(PdfDocument from, PdfDocument to)

    for (int i = 0; i < from.PageCount; i++)
    
        to.AddPage(from.Pages[i]);

【讨论】：

啊，看起来 Martin 打败了我，我说这是因为我正在挖掘我的代码示例 :) 嗨，andrew，请你看看这个....请***.com/questions/6953471/… 有没有其他人得到“非静态字段、方法或属性需要对象引用...”错误，围绕 CopyPages(one, outDocument); CopyPages(二, outDocument); 如果您从 nuget 安装 PDFSharp，请确保您使用的是预发布版本，否则您可能会遇到此错误：***.com/questions/36788746/… 我不知道它是否仍然在维护，如果该站点只是暂时无法访问或一段时间后关闭。 NuGet 的无论如何链接nuget.org/packages/PdfSharp【参考方案2】：

这是一个使用 PDFSharp 合并 X 个 PDF 的函数

using PdfSharp;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;

public static void MergePDFs(string targetPath, params string[] pdfs) 
    using(var targetDoc = new PdfDocument())
        foreach (var pdf in pdfs) 
            using (var pdfDoc = PdfReader.Open(pdf, PdfDocumentOpenMode.Import)) 
                for (var i = 0; i < pdfDoc.PageCount; i++) 
                    targetDoc.AddPage(pdfDoc.Pages[i]);
            
        
        targetDoc.Save(targetPath);

【讨论】：

需要使用使用PdfSharp；使用 PdfSharp.Pdf；使用 PdfSharp.Pdf.IO; 你在 string[] pdf 中传递了什么值？是文件路径吗？ @KMR 是的，这是文件路径。【参考方案3】：

这是我想出来的，并想与你分享，使用 PdfSharp。

这里你可以将多个Pdfs合二为一，不需要输出目录（按照输入列表顺序）

    public static byte[] MergePdf(List<byte[]> pdfs)
    
        List<PdfSharp.Pdf.PdfDocument> lstDocuments = new List<PdfSharp.Pdf.PdfDocument>();
        foreach (var pdf in pdfs)
        
            lstDocuments.Add(PdfReader.Open(new MemoryStream(pdf), PdfDocumentOpenMode.Import));
        

        using (PdfSharp.Pdf.PdfDocument outPdf = new PdfSharp.Pdf.PdfDocument())
         
            for(int i = 1; i<= lstDocuments.Count; i++)
            
                foreach(PdfSharp.Pdf.PdfPage page in lstDocuments[i-1].Pages)
                
                    outPdf.AddPage(page);
                
            

            MemoryStream stream = new MemoryStream();
            outPdf.Save(stream, false);
            byte[] bytes = stream.ToArray();

            return bytes;

【讨论】：

【参考方案4】：

我使用 iTextsharp 和 c# 来合并 pdf 文件。这是我使用的代码。

string[] lstFiles=new string[3];
    lstFiles[0]=@"C:/pdf/1.pdf";
    lstFiles[1]=@"C:/pdf/2.pdf";
    lstFiles[2]=@"C:/pdf/3.pdf";

    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage;
    string outputPdfPath=@"C:/pdf/new.pdf";


    sourceDocument = new Document();
    pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

    //Open the output file
    sourceDocument.Open();

    try
    
        //Loop through the files list
        for (int f = 0; f < lstFiles.Length-1; f++)
        
            int pages =get_pageCcount(lstFiles[f]);

            reader = new PdfReader(lstFiles[f]);
            //Add pages of current file
            for (int i = 1; i <= pages; i++)
            
                importedPage = pdfCopyProvider.GetImportedPage(reader, i);
                pdfCopyProvider.AddPage(importedPage);
            

            reader.Close();
         
        //At the end save the output file
        sourceDocument.Close();
    
    catch (Exception ex)
    
        throw ex;
    


private int get_pageCcount(string file)

    using (StreamReader sr = new StreamReader(File.OpenRead(file)))
    
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;

【讨论】：

【参考方案5】：

这是一个使用 iTextSharp 的示例

public static void MergePdf(Stream outputPdfStream, IEnumerable<string> pdfFilePaths)

    using (var document = new Document())
    using (var pdfCopy = new PdfCopy(document, outputPdfStream))
    
        pdfCopy.CloseStream = false;
        try
        
            document.Open();
            foreach (var pdfFilePath in pdfFilePaths)
            
                using (var pdfReader = new PdfReader(pdfFilePath))
                
                    pdfCopy.AddDocument(pdfReader);
                    pdfReader.Close();
                
            
        
        finally
        
            document?.Close();

PdfReader 构造函数有很多重载。可以用IEnumerable<Stream> 替换参数类型IEnumerable<string>，它应该也可以工作。请注意，该方法不会关闭 OutputStream，而是将该任务委托给 Stream 创建者。

【讨论】：

【参考方案6】：

PDFsharp 似乎允许将多个 PDF 文档合并为一个。

ITextSharp 也是如此。

【讨论】：

【参考方案7】：

这里已经有一些很好的答案，但我想我可能会提到pdftk 可能对这项任务有用。您可以生成所需的每个 PDF，然后将它们组合在一起作为 pdftk 的后处理，而不是直接生成一个 PDF。这甚至可以在您的程序中使用 system() 或 ShellExecute() 调用来完成。

【讨论】：

【参考方案8】：

使用 iTextSharp 最高版本 5.x 组合两个 byte[]：

internal static MemoryStream mergePdfs(byte[] pdf1, byte[] pdf2)

    MemoryStream outStream = new MemoryStream();
    using (Document document = new Document())
    using (PdfCopy copy = new PdfCopy(document, outStream))
    
        document.Open();
        copy.AddDocument(new PdfReader(pdf1));
        copy.AddDocument(new PdfReader(pdf2));
    
    return outStream;

除了byte[]，也可以传递Stream

【讨论】：

我添加了有关您的代码所需的外部库的信息。请始终添加使用您的答案所需的所有信息。【参考方案9】：

我把上面两个合并了，因为我需要合并3个pdfbytes并返回一个byte

internal static byte[] mergePdfs(byte[] pdf1, byte[] pdf2,byte[] pdf3)
        
            MemoryStream outStream = new MemoryStream();
            using (Document document = new Document())
            using (PdfCopy copy = new PdfCopy(document, outStream))
            
                document.Open();
                copy.AddDocument(new PdfReader(pdf1));
                copy.AddDocument(new PdfReader(pdf2));
                copy.AddDocument(new PdfReader(pdf3));
            
            return outStream.ToArray();

【讨论】：

【参考方案10】：

你可以试试pdf-shufflergtk-apps.org

【讨论】：

【参考方案11】：

我知道很多人都推荐了 PDF Sharp，但该项目似乎自 2008 年 6 月以来没有更新。此外，源不可用。

就我个人而言，我一直在使用 iTextSharp，它非常易于使用。

【讨论】：

PDFsharp 1.32 incl. sources (2012-03-07)【参考方案12】：

以下方法获取byte数组中的List，即PDFbyte数组，然后返回byte数组。

using ...;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;

public static class PdfHelper

    public static byte[] PdfConcat(List<byte[]> lstPdfBytes)
    
        byte[] res;

        using (var outPdf = new PdfDocument())
        
            foreach (var pdf in lstPdfBytes)
            
                using (var pdfStream = new MemoryStream(pdf))
                using (var pdfDoc = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
                    for (var i = 0; i < pdfDoc.PageCount; i++)
                        outPdf.AddPage(pdfDoc.Pages[i]);
            

            using (var memoryStreamOut = new MemoryStream())
            
                outPdf.Save(memoryStreamOut, false);

                res = Stream2Bytes(memoryStreamOut);
            
        

        return res;
    

    public static void DownloadAsPdfFile(string fileName, byte[] content)
    
        var ms = new MemoryStream(content);

        HttpContext.Current.Response.Clear();
        HttpContext.Current.Response.ContentType = "application/pdf";
        HttpContext.Current.Response.AddHeader("content-disposition", $"attachment;filename=fileName.pdf");
        HttpContext.Current.Response.Buffer = true;
        ms.WriteTo(HttpContext.Current.Response.OutputStream);
        HttpContext.Current.Response.End();
    

    private static byte[] Stream2Bytes(Stream input)
    
        var buffer = new byte[input.Length];
        using (var ms = new MemoryStream())
        
            int read;
            while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
                ms.Write(buffer, 0, read);

            return ms.ToArray();

所以，PdfHelper.PdfConcat 方法的结果被传递给PdfHelper.DownloadAsPdfFile 方法。

PS：需要安装一个名为[PdfSharp][1] 的NuGet 包。所以在Package Manage Console窗口中输入：

安装包 PdfSharp

【讨论】：

【参考方案13】：

以下方法使用 iTextSharp 合并两个 pdf（f1 和 f2）。第二个 pdf 附加在 f1 的特定索引之后。

 string f1 = "D:\\a.pdf";
 string f2 = "D:\\Iso.pdf";
 string outfile = "D:\\c.pdf";
 appendPagesFromPdf(f1, f2, outfile, 3);




  public static void appendPagesFromPdf(String f1,string f2, String destinationFile, int startingindex)
        
            PdfReader p1 = new PdfReader(f1);
            PdfReader p2 = new PdfReader(f2);
            int l1 = p1.NumberOfPages, l2 = p2.NumberOfPages;


            //Create our destination file
            using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
            
                Document doc = new Document();

                PdfWriter w = PdfWriter.GetInstance(doc, fs);
                doc.Open();
                for (int page = 1; page <= startingindex; page++)
                
                    doc.NewPage();
                    w.DirectContent.AddTemplate(w.GetImportedPage(p1, page), 0, 0);
                    //Used to pull individual pages from our source

                //  copied pages from first pdf till startingIndex
                for (int i = 1; i <= l2;i++)
                
                    doc.NewPage();
                    w.DirectContent.AddTemplate(w.GetImportedPage(p2, i), 0, 0);
                // merges second pdf after startingIndex
                for (int i = startingindex+1; i <= l1;i++)
                
                    doc.NewPage();
                    w.DirectContent.AddTemplate(w.GetImportedPage(p1, i), 0, 0);
                // continuing from where we left in pdf1 

                doc.Close();
                p1.Close();
                p2.Close();

【讨论】：

这是一种非常有损的方法（所有注释都丢失了），并且在将页面与页面旋转合并时存在问题。查看hmadrigal's answer 以获得更好的方法。谢谢，我去看看。【参考方案14】：

为了解决类似的问题，我使用了这样的 iTextSharp：

//Create the document which will contain the combined PDF's
Document document = new Document();

//Create a writer for de document
PdfCopy writer = new PdfCopy(document, new FileStream(OutPutFilePath, FileMode.Create));
if (writer == null)

     return;


//Open the document
document.Open();

//Get the files you want to combine
string[] filePaths = Directory.GetFiles(DirectoryPathWhereYouHaveYourFiles);
foreach (string filePath in filePaths)

     //Read the PDF file
     using (PdfReader reader = new PdfReader(vls_FilePath))
     
         //Add the file to the combined one
         writer.AddDocument(reader);
     


//Finally close the document and writer
writer.Close();
document.Close();

【讨论】：

【参考方案15】：

这是一个使用PDFSharp and ConcatenateDocuments的示例的链接

【讨论】：

【参考方案16】：

这里是解决方案http://www.wacdesigns.com/2008/10/03/merge-pdf-files-using-c 它使用免费的开源 iTextSharp 库http://sourceforge.net/projects/itextsharp

【讨论】：

【参考方案17】：

我已经用 PDFBox 做到了这一点。我想它的工作原理类似于 iTextSharp。

【讨论】：

以上是关于合并两个（或更多）PDF的主要内容，如果未能解决你的问题，请参考以下文章