使用 PhantomJS 嵌入网页的所有图像会产生警告但有效

Posted

技术标签:

【中文标题】使用 PhantomJS 嵌入网页的所有图像会产生警告但有效【英文标题】:Using PhantomJS to embed all images of a webpage produces warnings but works 【发布时间】:2014-12-23 20:40:02 【问题描述】:

我正在尝试通过嵌入所有图像(以及通过这一点后的其他外部资源)将网页转换为单个文件。以下是我运行 PhantomJs 的方式:

./phantomjs --web-security=false ./embed_images.js http://localhost/index.html > output.txt

这是embed_images.js

var page = require('webpage').create(),
    system = require('system'),
    address;

if (system.args.length === 1) 
    console.log('Usage: embed_images.js <some URL>');
    phantom.exit(1);

else 
    page.onConsoleMessage = function(msg) 
        console.log(msg);
    ;
    address = system.args[1];
    page.open(address, function(status) 
        page.evaluate(function() 
            function embedImg(org) 
                var img = new Image();
                img.src = org.src;
                img.onload = function() 
                    var canvas = document.createElement("canvas");
                    canvas.width = this.width;
                    canvas.height = this.height;

                    var ctx = canvas.getContext("2d");
                    ctx.drawImage(this, 0, 0);

                    var dataURL = canvas.toDataURL("image/png");

                    org.src = dataURL;
                    console.log(dataURL);
                
            
            var imgs = document.getElementsByTagName("img");
            for (var index=0; index < imgs.length; index++) 
                embedImg(imgs[index]);
            
        );
        phantom.exit()
    );

当我运行上述命令时,它会生成如下文件:

Unsafe javascript attempt to access frame with URL  from frame with URL file://./embed_images.js. Domains, protocols and ports must match.
Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

上述错误消息有多个实例。为了测试出了什么问题,我在 Chromium 的控制台中运行了以下代码:

function embedImg(org) 
    var img = new Image();
    img.src = org.src;
    img.onload = function() 
        var canvas = document.createElement("canvas");
        canvas.width = this.width;
        canvas.height = this.height;

        var ctx = canvas.getContext("2d");
        ctx.drawImage(this, 0, 0);

        var dataURL = canvas.toDataURL("image/png");

        org.src = dataURL;
        console.log(dataURL);
    

var imgs = document.getElementsByTagName("img");
for (var index=0; index < imgs.length; index++) 
    embedImg(imgs[index]);

而且效果很好(我的网页没有引用任何跨域图片)!它将所有图像嵌入到 HTML 页面中。有谁知道可能是什么问题?

这是我的index.html 文件的内容:

<!DOCTYPE html >
<html>
<head>
<meta charset="utf-8" />
</head>

<body>
<img src="1.png" >
</body>
</html>

以及实际输出(output.txt):

Unsafe JavaScript attempt to access frame with URL  from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL  from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

Unsafe JavaScript attempt to access frame with URL about:blank from frame with URL file://./embed_images.js. Domains, protocols and ports must match.

奇怪的是,虽然我的页面上只有一张图片,但有很多错误消息!

我正在使用 phantomjs-1.9.8-linux-x86_64

【问题讨论】:

可能跟这个有关:***.com/q/26424765 该错误属于toDataURL 调用,正如您提到的帖子中所指出的那样。但我不能确定它们是否相同,因为它们都在谈论 SVG,而我的只有一张 PNG 图像。 如果您将所有内容都包装在setTimeout(function()/*HERE*/, 2000); 中的page.open 回调中会发生什么? 你是对的。这都是 Image 的 onload 回调错误的异步行为。如果您发布它,我很乐意将您的建议标记为答案。谢谢。 我会调查的。有趣的是,我以前从未见过它,但直到今天才出现这种情况,试图为另一个问题找到解决方案。 【参考方案1】:

当调用phantom.exit 时会打印这些通知。它们不会造成任何麻烦,但当您需要干净的 PhantomJS 输出时,它们就不好用了。在您的情况下,您可以通过“异步”phantom.exit 来抑制通知,如下所示:

setTimeout(function()
    phantom.exit();
, 0);

我认为发生这种情况的原因是当幻像试图退出时,从页面上下文传递了一个大字符串。

我为此创建了一个github issue。

【讨论】:

你把这个放在哪里?在你的脚本结束时? @Optimus 当你通常想调用phantom.exit() 时,你会用setTimeout() 包装那个调用。它在脚本的末尾,但前提是您考虑执行而不是实际的代码行。

以上是关于使用 PhantomJS 嵌入网页的所有图像会产生警告但有效的主要内容,如果未能解决你的问题,请参考以下文章

产生的phantomjs过程挂起

WPF 用户控件嵌入网页

Python:Phantomjs 找不到 chrome webdriver 工作正常的元素

使用selenium和phantomJS浏览器获取网页内容的小演示

使用PhantomJS和node.js保存并呈现网页

使用 PhantomJS 和 node.js 保存和渲染网页