如何使用 PDF.JS 显示整个 PDF(不仅仅是一页)?

Posted

技术标签:

【中文标题】如何使用 PDF.JS 显示整个 PDF(不仅仅是一页)?【英文标题】:How to display whole PDF (not only one page) with PDF.JS? 【发布时间】:2013-05-05 00:46:40 【问题描述】:

我已经创建了这个演示:

http://polishwords.com.pl/dev/pdfjs/test.html

它显示一页。我想显示所有页面。一个在另一个之下,或者放置一些按钮来更改页面,甚至更好地加载 PDF.JS 的所有标准控件,就像在 Firefox 中一样。如何实现?

【问题讨论】:

github.com/mozilla/pdf.js 在这里获得灵感:mozilla.github.io/pdf.js/web/viewer.html @DekDekku kuncajs 在我问这个问题之前,我今天整天都在阅读这些网站。他们没有帮助 @tomaszs 你为什么没有将此标记为已回答? 您的问题将通过此解决方案得到解答! ***.com/questions/25162554/… 【参考方案1】:

PDFJS 有一个成员变量numPages,因此您只需遍历它们。 但是请务必记住,在 pdf.js 中获取页面是异步的,因此无法保证顺序。所以你需要把它们锁起来。您可以按照以下方式做一些事情:

var currPage = 1; //Pages are 1-based not 0-based
var numPages = 0;
var thePDF = null;

//This is where you start
PDFJS.getDocument(url).then(function(pdf) 

        //Set PDFJS global object (so we can easily access in our page functions
        thePDF = pdf;

        //How many pages it has
        numPages = pdf.numPages;

        //Start with first page
        pdf.getPage( 1 ).then( handlePages );
);



function handlePages(page)

    //This gives us the page's dimensions at full scale
    var viewport = page.getViewport( 1 );

    //We'll create a canvas for each page to draw it on
    var canvas = document.createElement( "canvas" );
    canvas.style.display = "block";
    var context = canvas.getContext('2d');
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    //Draw it on the canvas
    page.render(canvasContext: context, viewport: viewport);

    //Add it to the web page
    document.body.appendChild( canvas );

    //Move to next page
    currPage++;
    if ( thePDF !== null && currPage <= numPages )
    
        thePDF.getPage( currPage ).then( handlePages );
    

【讨论】:

这对我不起作用。我的画布在 div 内,当在代码上方运行时,它会在页面末尾显示 pdf 页面(不是 div) @Sara 你需要学习 DOM。上面的代码只是一个例子。它将创建的页面附加到文档中。您需要将它们放在您的 div 中,并根据项目的需要设置画布的样式。但所有这些都超出了这个问题的范围 感谢您的快速回复:) 我添加了 div 并将画布添加到正确的位置但它覆盖了它们.. @Mr.Hyde 我已经好几年没看过这个项目了。很可能 api 有方法可以帮助解决这个问题,但您仍然可以使用画布并监听鼠标事件来实现文本选择。 完美解决方案【参考方案2】:

这是我的看法。以正确的顺序呈现所有页面,并且仍然异步工作。

<style>
  #pdf-viewer 
    width: 100%;
    height: 100%;
    background: rgba(0, 0, 0, 0.1);
    overflow: auto;
  
  
  .pdf-page-canvas 
    display: block;
    margin: 5px auto;
    border: 1px solid rgba(0, 0, 0, 0.2);
  
</style>

<script>   
    url = 'https://github.com/mozilla/pdf.js/blob/master/test/pdfs/tracemonkey.pdf';
    var thePdf = null;
    var scale = 1;
    
    PDFJS.getDocument(url).promise.then(function(pdf) 
        thePdf = pdf;
        viewer = document.getElementById('pdf-viewer');
        
        for(page = 1; page <= pdf.numPages; page++) 
          canvas = document.createElement("canvas");    
          canvas.className = 'pdf-page-canvas';         
          viewer.appendChild(canvas);            
          renderPage(page, canvas);
        
    );
    
    function renderPage(pageNumber, canvas) 
        thePdf.getPage(pageNumber).then(function(page) 
          viewport = page.getViewport( scale: scale );
          canvas.height = viewport.height;
          canvas.width = viewport.width;          
          page.render(canvasContext: canvas.getContext('2d'), viewport: viewport);
    );
    
</script>

<div id='pdf-viewer'></div>

【讨论】:

太棒了 - 谢谢! 简单干净。使其 scale = 2 for web。 此解决方案如何确保正确的页面顺序?在我看来,它仍然可能在竞争条件下出现故障,因为您正在遍历要渲染的页面,但不等待上一页完成处理?如果我看错了,请纠正我 @redfox05 画布元素按顺序创建和附加。然后,渲染函数在它作为参数接收的画布上工作。 @RetoHöhener,谢谢,是的,我自己也想知道,所以我回来看看你是否回复了。当我考虑传递引用时,它就点击了。所以画布是按顺序创建的,然后对它的引用被传递给渲染函数,所以当它完成加载该页面时,它将它扔到它来自的原始画布元素中,从而按顺序渲染它:)我认为在我的实现中,我会在元素中添加某种 ID 计数,以使其对下一个开发人员更加明显。【参考方案3】:

pdfjs-dist 库包含用于构建 PDF 查看器的部分。您可以使用 PDFPageView 呈现所有页面。基于https://github.com/mozilla/pdf.js/blob/master/examples/components/pageviewer.html:

var url = "https://cdn.mozilla.net/pdfjs/tracemonkey.pdf";
var container = document.getElementById('container');
// Load document
PDFJS.getDocument(url).then(function (doc) 
  var promise = Promise.resolve();
  for (var i = 0; i < doc.numPages; i++) 
    // One-by-one load pages
    promise = promise.then(function (id) 
      return doc.getPage(id + 1).then(function (pdfPage) 
// Add div with page view.
var SCALE = 1.0; 
var pdfPageView = new PDFJS.PDFPageView(
      container: container,
      id: id,
      scale: SCALE,
      defaultViewport: pdfPage.getViewport(SCALE),
      // We can enable text/annotations layers, if needed
      textLayerFactory: new PDFJS.DefaultTextLayerFactory(),
      annotationLayerFactory: new PDFJS.DefaultAnnotationLayerFactory()
    );
    // Associates the actual page with the view, and drawing it
    pdfPageView.setPdfPage(pdfPage);
    return pdfPageView.draw();        
      );
    .bind(null, i));
  
  return promise;
);
#container > *:not(:first-child) 
  border-top: solid 1px black; 
<link href="https://npmcdn.com/pdfjs-dist/web/pdf_viewer.css" rel="stylesheet"/>
<script src="https://npmcdn.com/pdfjs-dist/web/compatibility.js"></script>
<script src="https://npmcdn.com/pdfjs-dist/build/pdf.js"></script>
<script src="https://npmcdn.com/pdfjs-dist/web/pdf_viewer.js"></script>

<div id="container" class="pdfViewer singlePageView"></div>

【讨论】:

感谢您提供我需要的工作代码 sn-p。 "message": "Uncaught ReferenceError: PDFJS is not defined",【参考方案4】:

已接受的答案不再有效(2021 年),由于 API 将 var viewport = page.getViewport( 1 ); 更改为 var viewport = page.getViewport(scale: scale);,您可以尝试以下完整的工作 html,只需将以下内容复制到 html 文件,然后打开它:

<html>
<head>
<script src="https://mozilla.github.io/pdf.js/build/pdf.js"></script>
<head>
<body>
</body>

<script>

var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/web/compressed.tracemonkey-pldi-09.pdf';

// Loaded via <script> tag, create shortcut to access PDF.js exports.
var pdfjsLib = window['pdfjs-dist/build/pdf'];

// The workerSrc property shall be specified.
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://mozilla.github.io/pdf.js/build/pdf.worker.js';

var currPage = 1; //Pages are 1-based not 0-based
var numPages = 0;
var thePDF = null;

//This is where you start
pdfjsLib.getDocument(url).promise.then(function(pdf) 

        //Set PDFJS global object (so we can easily access in our page functions
        thePDF = pdf;

        //How many pages it has
        numPages = pdf.numPages;

        //Start with first page
        pdf.getPage( 1 ).then( handlePages );
);


function handlePages(page)

    //This gives us the page's dimensions at full scale
    var viewport = page.getViewport( scale: 1.5 );

    //We'll create a canvas for each page to draw it on
    var canvas = document.createElement( "canvas" );
    canvas.style.display = "block";
    var context = canvas.getContext('2d');

    canvas.height = viewport.height;
    canvas.width = viewport.width;

    //Draw it on the canvas
    page.render(canvasContext: context, viewport: viewport);

    //Add it to the web page
    document.body.appendChild( canvas );

    var line = document.createElement("hr");
    document.body.appendChild( line );

    //Move to next page
    currPage++;
    if ( thePDF !== null && currPage <= numPages )
    
        thePDF.getPage( currPage ).then( handlePages );
    

</script>

</html>

【讨论】:

是唯一改变getViewport的方法吗? @DonRhummy 是的。 这是截至 2021 年的有效答案【参考方案5】:

以下答案是部分答案,针对任何试图让 PDF.js 在 2019 年显示整个 PDF 的人,因为 api 已发生重大变化。这当然是 OP 最关心的问题。 inspiration sample code

请注意以下几点:

正在使用额外的库 -- Lodash(用于 range() 函数)和 polyfills(用于 promises)...... 正在使用引导程序
    <div class="row">
        <div class="col-md-10 col-md-offset-1">
            <div id="wrapper">

            </div>
        </div>
    </div>

    <style>
        body 
            background-color: #808080;
            /* margin: 0; padding: 0; */
        
    </style>    
    <link href="//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.1.266/pdf_viewer.css" rel="stylesheet"/>    

    <script src="//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.1.266/pdf.js"></script>    
    <script src="//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.1.266/pdf_viewer.js"></script>
    <script src="//cdn.polyfill.io/v2/polyfill.min.js"></script>    
    <script src="//cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.15/lodash.js"></script>
    <script>
        $(document).ready(function () 
            // startup
        );

        'use strict';

        if (!pdfjsLib.getDocument || !pdfjsViewer.PDFViewer) 
            alert("Please build the pdfjs-dist library using\n" +
                "  `gulp dist-install`");
        

        var url = '//www.pdf995.com/samples/pdf.pdf';

        pdfjsLib.GlobalWorkerOptions.workerSrc =
            '//cdnjs.cloudflare.com/ajax/libs/pdf.js/2.1.266/pdf.worker.js';

        var loadingTask = pdfjsLib.getDocument(url);
        loadingTask.promise.then(function(pdf) 
            // please be aware this uses .range() function from lodash
            var pagePromises = _.range(1, pdf.numPages).map(function(number) 
                return pdf.getPage(number);
            );
            return Promise.all(pagePromises);
        ).then(function(pages) 
                var scale = 1.5;
                var canvases = pages.forEach(function(page) 
                    var viewport = page.getViewport( scale: scale, ); // Prepare canvas using PDF page dimensions

                    var canvas = document.createElement('canvas');
                    canvas.height = viewport.height;
                    canvas.width = viewport.width; // Render PDF page into canvas context

                    var canvasContext = canvas.getContext('2d');
                    var renderContext = 
                        canvasContext: canvasContext,
                        viewport: viewport
                    ;
                    page.render(renderContext).promise.then(function() 
                        if (false)
                            return console.log('Page rendered');
                    );
                    document.getElementById('wrapper').appendChild(canvas);
                );
            ,
            function(error) 
                return console.log('Error', error);
            );
    </script>

【讨论】:

【参考方案6】:

如果你想在不同的画布中渲染pdf文档的所有页面,都一个一个同步,这是一种解决方案:

index.html

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>PDF Sample</title>
    <script type="text/javascript" src="jquery.js"></script>
    <script type="text/javascript" src="pdf.js"></script>
    <script type="text/javascript" src="main.js">
    </script>
    <link rel="stylesheet" type="text/css" href="main.css">
</head>
<body id="body">  
</body>
</html>

main.css

canvas 
    display: block;

main.js

$(function()   
    var filePath = "document.pdf";

    function Num(num) 
        var num = num;

        return function () 
            return num;
        
    ;

    function renderPDF(url, canvasContainer, options) 
        var options = options || 
                scale: 1.5
            ,          
            func,
            pdfDoc,
            def = $.Deferred(),
            promise = $.Deferred().resolve().promise(),         
            width, 
            height,
            makeRunner = function(func, args) 
                return function() 
                    return func.call(null, args);
                ;
            ;

        function renderPage(num)           
            var def = $.Deferred(),
                currPageNum = new Num(num);
            pdfDoc.getPage(currPageNum()).then(function(page) 
                var viewport = page.getViewport(options.scale);
                var canvas = document.createElement('canvas');
                var ctx = canvas.getContext('2d');
                var renderContext = 
                    canvasContext: ctx,
                    viewport: viewport
                ;

                if(currPageNum() === 1)                    
                    height = viewport.height;
                    width = viewport.width;
                

                canvas.height = height;
                canvas.width = width;

                canvasContainer.appendChild(canvas);

                page.render(renderContext).then(function()                                         
                    def.resolve();
                );
            )

            return def.promise();
        

        function renderPages(data) 
            pdfDoc = data;

            var pagesCount = pdfDoc.numPages;
            for (var i = 1; i <= pagesCount; i++)  
                func = renderPage;
                promise = promise.then(makeRunner(func, i));
            
        

        PDFJS.disableWorker = true;
        PDFJS.getDocument(url).then(renderPages);       
    ;

    var body = document.getElementById("body");
    renderPDF(filePath, body);
);

【讨论】:

这个canvasContainer是从哪里来的?你能解释一下我有动态 div 里面的画布将在页面加载后点击 repesticve 链接后附加 我正在使用 TouchPDF 库【参考方案7】:

首先请注意,这样做确实不是一个好主意;如https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages中所述

怎么做;

使用 Mozilla 提供的查看器; https://mozilla.github.io/pdf.js/web/viewer.html

修改BaseViewer类,viewer.js中的_getVisiblePages()方法为

/* load all pages */ 
_getVisiblePages() 
      let visible = [];
      let currentPage = this._pages[this._currentPageNumber - 1];
      for (let i=0; i<this.pagesCount; i++)
        let aPage = this._pages[i];
        visible.push( id: aPage.id, view: aPage, );
      
      return  first: currentPage, last: currentPage, views: visible, ;
    

【讨论】:

谢谢。你刚刚为我节省了很多工作。【参考方案8】:

如果您想在不同的画布中呈现pdf文档的所有页面

<html>
   <head>
      <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
      <script src="pdf.js"></script>
      <script src="jquery.js"></script>
   </head>
   <body>
      <h1>PDF.js 'Hello, world!' example</h1>
      <div id="canvas_div"></div>
      <body>
         <script>
            // If absolute URL from the remote server is provided, configure the CORS
            // header on that server.
            var url = 'pdff.pdf';          
            // Loaded via <script> tag, create shortcut to access PDF.js exports.
            var pdfjsLib = window['pdfjs-dist/build/pdf'];            
            // The workerSrc property shall be specified.
            pdfjsLib.GlobalWorkerOptions.workerSrc = 'worker.js';
            var loadingTask = pdfjsLib.getDocument(url);
                loadingTask.promise.then(function(pdf) 
					 var __TOTAL_PAGES = pdf.numPages; 
					  // Fetch the first page
					  var pageNumber = 1;			  
					for( let i=1; i<=__TOTAL_PAGES; i+=1)
						var id ='the-canvas'+i;
						$('#canvas_div').append("<div style='background-color:gray;text-align: center;padding:20px;' ><canvas calss='the-canvas' id='"+id+"'></canvas></div>");				
						  var canvas = document.getElementById(id);
						  //var pageNumber = 1;
						  renderPage(canvas, pdf, pageNumber++, function pageRenderingComplete() 
							if (pageNumber > pdf.numPages) 
							  return; 
							
							// Continue rendering of the next page
							renderPage(canvas, pdf, pageNumber++, pageRenderingComplete);
						  );				
					            	  
                );           
                function renderPage(canvas, pdf, pageNumber, callback) 
                  pdf.getPage(pageNumber).then(function(page) 
                    var scale = 1.5;
                    var viewport = page.getViewport(scale: scale);            
                    var pageDisplayWidth = viewport.width;
                    var pageDisplayHeight = viewport.height;
            		//var pageDivHolder = document.createElement();
                    // Prepare canvas using PDF page dimensions
                    //var canvas = document.createElement(id);
                    var context = canvas.getContext('2d');
                    canvas.width = pageDisplayWidth;
                    canvas.height = pageDisplayHeight;
                   // pageDivHolder.appendChild(canvas);           
                    // Render PDF page into canvas context
                    var renderContext = 
                      canvasContext: context,
                      viewport: viewport
                    ;
                    page.render(renderContext).promise.then(callback);
                  );
                           
         </script>
         <html>

【讨论】:

用你的答案给出一些解释。只给代码就没用了。【参考方案9】:

accepted answer 完美适用于单个 PDF。在我的例子中,有多个 PDF,我想以相同的数组序列呈现所有页面。

我调整了代码,将全局变量封装在一个对象数组中,如下所示:

    var docs = []; // Add this object array
    var urls = []; // You would need an array of the URLs to start.

    // Loop through each url. You will also need the index for later.
    urls.forEach((url, ix) => 

        //Get the document from the url.
        PDFJS.getDocument(url).then(function(pdf) 

            // Make new doc object and set the properties of the new document
            var doc = ;

            //Set PDFJS global object (so we can easily access in our page functions
            doc.thePDF = pdf;
    
            //How many pages it has
            doc.numPages = pdf.numPages;
    
            //Push the new document to the global object array
            docs.push(doc);            

            //Start with first page -- pass through the index for the handlePages method
            pdf.getPage( 1 ).then(page => handlePages(page, ix) );
    );
);
    
    
    
    function handlePages(page, ix)
    
        //This gives us the page's dimensions at full scale
        var viewport = page.getViewport( scale: 1 );
    
        //We'll create a canvas for each page to draw it on
        var canvas = document.createElement( "canvas" );
        canvas.style.display = "block";
        var context = canvas.getContext('2d');
        canvas.height = viewport.viewBox[3];
        canvas.width = viewport.viewBox[2];
    
        //Draw it on the canvas
        page.render(canvasContext: context, viewport: viewport);
    
        //Add it to an element based on the index so each document is added to its own element
        document.getElementById('doc-' + ix).appendChild( canvas );
    
        //Move to next page using the correct doc object from the docs object array
        docs[ix].currPage++;
        if ( docs[ix].thePDF !== null && docs[ix].currPage <= docs[ix].numPages )
        
            console.log("Rendering page " + docs[ix].currPage + " of document #" + ix);
            docs[ix].thePDF.getPage( docs[ix].currPage ).then(newPage => handlePages(newPage, ix) );
        
    

由于整个操作是异步的,每个文档没有唯一的对象,所以thePDFcurrPagenumPages的全局变量会在后续的PDF渲染时被覆盖,导致随机页面被跳过,整个文档跳过或将一个文档中的页面附加到错误的文档中。

最后一点是,如果这是离线完成或不使用 ES6 模块,PDFJS.getDocument(url).then() 方法应该更改为 pdfjsLib.getDocument(url).promise.then()

【讨论】:

【参考方案10】:

让它在每一页上迭代你想要多少。

const url = '/storage/documents/reports/AR-2020-CCBI IND.pdf';

pdfjsLib.GlobalWorkerOptions.workerSrc = '/vendor/pdfjs-dist-2.12.313/package/build/pdf.worker.js';

const loadingTask = pdfjsLib.getDocument(
    url: url,
    verbosity: 0
);

(async () => 
    const pdf = await loadingTask.promise;
    let numPages = await pdf.numPages;

    if (numPages > 10) 
        numPages = 10;
    

    for (let i = 1; i <= numPages; i++) 
        let page = await pdf.getPage(i);
        let scale = 1.5;
        let viewport = page.getViewport( scale );
        let outputScale = window.devicePixelRatio || 1;

        let canvas = document.createElement('canvas');
        let context = canvas.getContext("2d");

        canvas.width = Math.floor(viewport.width * outputScale);
        canvas.height = Math.floor(viewport.height * outputScale);
        canvas.style.width = Math.floor(viewport.width) + "px";
        canvas.style.height = Math.floor(viewport.height) + "px";

        document.getElementById('canvas-column').appendChild(canvas);

        let transform = outputScale !== 1 
            ? [outputScale, 0, 0, outputScale, 0, 0] 
            : null;

        let renderContext = 
            canvasContext: context,
            transform,
            viewport
        ;

        page.render(renderContext);
    
)();

【讨论】:

以上是关于如何使用 PDF.JS 显示整个 PDF(不仅仅是一页)?的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 pdf.js 将每个用户上传的 pdf 文件的第一页显示为 Django 中的预览?

如何以 HTML 格式显示 PDF 文件?

PDF预览之PDFObject.js总结

PDF.js 如何取到PDF的目录页码

如何在 JSP 页面中显示 PDF 缩略图

解决pdf.js无法完全显示pdf文件内容的问题