我可以将整个 HTML 文档加载到 Internet Explorer 中的文档片段中吗？

Posted 2023-02-22

技术标签:

【中文标题】我可以将整个 HTML 文档加载到 Internet Explorer 中的文档片段中吗？【英文标题】：Can I load an entire HTML document into a document fragment in Internet Explorer? 【发布时间】：2011-11-20 10:37:09 【问题描述】：

这是我一直遇到一些困难的事情。我有一个本地客户端脚本，需要允许用户获取远程网页并在结果页面中搜索表单。为了做到这一点（没有正则表达式），我需要将文档解析成一个完全可遍历的 DOM 对象。

我想强调一些限制：

我不想使用库（如 jQuery）。我需要在这里做的事情太多了。在任何情况下都不应执行来自远程页面的脚本（出于安全原因）。 DOM API，例如getElementsByTagName，需要可用。它只需要在 Internet Explorer 中工作，但至少在 7 中。让我们假设我没有访问服务器的权限。我有，但我不能用它来做这个。

我尝试过的

假设我在变量 html 中有一个完整的 HTML 文档字符串（包括 DOCTYPE 声明），这是我迄今为止尝试过的：

var frag = document.createDocumentFragment(),
div  = frag.appendChild(document.createElement("div"));

div.outerHTML = html;
//-> results in an empty fragment

div.insertAdjacentHTML("afterEnd", html);
//-> HTML is not added to the fragment

div.innerHTML = html;
//-> Error (expected, but I tried it anyway)

var doc = new ActiveXObject("htmlfile");
doc.write(html);
doc.close();
//-> javascript executes

我还尝试从 HTML 中提取 <head> 和 <body>nodes 并将它们添加到片段内的 <HTML> 元素中，但仍然没有成功。

有人有什么想法吗？

【问题讨论】：

I don't want to use libraries (like jQuery). There's too much bloat for what I need to do here 总是有闭包编译器：***.com/questions/1691861/… @Juan Mendes：iframe 会执行脚本，IE7 没有沙盒方法，除了 security 属性不能保证脚本不会运行。我只是说HTA这个词（对此我一无所知），粘贴以下链接并撤退。很有可能它完全没用。 msdn.microsoft.com/en-us/library/ms536496%28v=vs.85%29.aspx 哪个版本的IE？我遇到了渲染问题，其中 Trident 不会渲染在 6/7 加载到 innerHTML 值中的内容。当您使用不适当的 DOM 方法做某事时，就会发生这种行为。非常相关：使用 DOMParser 和 MIME 类型 text/html 解析 HTML 字符串：JavaScript DOMParser access innerHTML and other properties。 【参考方案1】：

假设 HTML 也是有效的 XML，您可以使用 loadXML()

【讨论】：

很遗憾，我不能这么认为。加载的 HTML 可以（理论上）来自网络上的任何站点。【参考方案2】：

不确定为什么要搞乱 documentFragments，您可以将 HTML 文本设置为新 div 元素的innerHTML。然后您可以将该 div 元素用于getElementsByTagName 等，而无需将 div 添加到 DOM：

var htmlText= '<html><head><title>Test</title></head><body><div id="test_ele1">this is test_ele1 content</div><div id="test_ele2">this is test_ele content2</div></body></html>';

var d = document.createElement('div');
d.innerHTML = htmlText;

console.log(d.getElementsByTagName('div'));

如果你真的认同 documentFragment 的想法，你可以使用这段代码，但你仍然必须将它包装在一个 div 中才能获得你想要的 DOM 函数：

function makeDocumentFragment(htmlText) 
    var range = document.createRange();
    var frag = range.createContextualFragment(htmlText);
    var d = document.createElement('div');
    d.appendChild(frag);
    return d;

【讨论】：

这会在附加到新创建的 div 之前去除 <head> 元素。我知道我没有指定我也需要头脑中的东西，但我确实需要（特别是<link> 元素）。我正在弄乱文档片段，因为如果可能的话，这似乎是最有可能工作的方法。 createContextualFragment 对我没有帮助，IE 不支持。我对此进行了相当多的研究——无法访问 developer.mozilla.org/En/DOM/DOMImplementation.createDocument 之类的东西，也没有使用 iFrame，实际上没有其他方法可以严格地在客户端执行此操作。不确定是否支持 IE 7 中的 Range/createContextualFragment，但在查看结果后，我意识到这与将 HTML 插入新的 div 元素没有什么不同。由于文档片段没有您想要的 DOM 功能，并且 div 不能有效地包含 HTML/BODY，我不确定您有什么选择。【参考方案3】：

DocumentFragment 不支持getElementsByTagName -- 只有Document 支持。

您可能需要使用像 jsdom 这样的库，它提供了 DOM 的实现，您可以通过它使用 getElementsByTagName 和其他 DOM API 进行搜索。您可以将其设置为不执行脚本。是的，它很“重”，我不知道它是否适用于 IE 7。

【讨论】：

有趣... IE 支持 getElementsByTagName 用于文档片段（这是我在问题中基于这一点的内容）。奇怪，但我想我不应该对 IE 不遵循规范感到惊讶。 Here's a discussion 这意味着 IE 上的 createDocumentFragment 实际上创建了 Document 而不是 DocumentFragment，这可以解释为什么它支持 getElementsByTagName。【参考方案4】：

小提琴：http://jsfiddle.net/JFSKe/6/

DocumentFragment 不实现 DOM 方法。将document.createElement 与innerHTML 结合使用会删除<head> 和<body> 标签（即使创建的元素是根元素，<html>）。因此，应该在其他地方寻求解决方案。我创建了一个跨浏览器 string-to-DOM 函数，它使用了一个不可见的内联框架。

所有外部资源和脚本都将被禁用。请参阅代码说明了解更多信息。

代码

/*
 @param String html    The string with HTML which has be converted to a DOM object
 @param func callback  (optional) Callback(HTMLDocument doc, function destroy)
 @returns              undefined if callback exists, else: Object
                        HTMLDocument doc  DOM fetched from Parameter:html
                        function destroy  Removes HTMLDocument doc.         */
function string2dom(html, callback)
    /* Sanitise the string */
    html = sanitiseHTML(html); /*Defined at the bottom of the answer*/

    /* Create an IFrame */
    var iframe = document.createElement("iframe");
    iframe.style.display = "none";
    document.body.appendChild(iframe);

    var doc = iframe.contentDocument || iframe.contentWindow.document;
    doc.open();
    doc.write(html);
    doc.close();

    function destroy()
        iframe.parentNode.removeChild(iframe);
    
    if(callback) callback(doc, destroy);
    else return "doc": doc, "destroy": destroy;


/* @name sanitiseHTML
   @param String html  A string representing HTML code
   @return String      A new string, fully stripped of external resources.
                       All "external" attributes (href, src) are prefixed by data- */

function sanitiseHTML(html)
    /* Adds a <!-\"'--> before every matched tag, so that unterminated quotes
        aren't preventing the browser from splitting a tag. Test case:
       '<input style="foo;b:url(0);><input onclick="<input type=button onclick="too() href=;>">' */
    var prefix = "<!--\"'-->";
    /*Attributes should not be prefixed by these characters. This list is not
     complete, but will be sufficient for this function.
      (see http://www.w3.org/TR/REC-xml/#NT-NameChar) */
    var att = "[^-a-z0-9:._]";
    var tag = "<[a-z]";
    var any = "(?:[^<>\"']*(?:\"[^\"]*\"|'[^']*'))*?[^<>]*";
    var etag = "(?:>|(?=<))";

    /*
      @name ae
      @description          Converts a given string in a sequence of the
                             original input and the HTML entity
      @param String string  String to convert
      */
    var entityEnd = "(?:;|(?!\\d))";
    var ents = " ":"(?:\\s|&nbsp;?|&#0*32"+entityEnd+"|&#x0*20"+entityEnd+")",
                "(":"(?:\\(|&#0*40"+entityEnd+"|&#x0*28"+entityEnd+")",
                ")":"(?:\\)|&#0*41"+entityEnd+"|&#x0*29"+entityEnd+")",
                ".":"(?:\\.|&#0*46"+entityEnd+"|&#x0*2e"+entityEnd+")";
                /*Placeholder to avoid tricky filter-circumventing methods*/
    var charMap = ;
    var s = ents[" "]+"*"; /* Short-hand space */
    /* Important: Must be pre- and postfixed by < and >. RE matches a whole tag! */
    function ae(string)
        var all_chars_lowercase = string.toLowerCase();
        if(ents[string]) return ents[string];
        var all_chars_uppercase = string.toUpperCase();
        var RE_res = "";
        for(var i=0; i<string.length; i++)
            var char_lowercase = all_chars_lowercase.charAt(i);
            if(charMap[char_lowercase])
                RE_res += charMap[char_lowercase];
                continue;
            
            var char_uppercase = all_chars_uppercase.charAt(i);
            var RE_sub = [char_lowercase];
            RE_sub.push("&#0*" + char_lowercase.charCodeAt(0) + entityEnd);
            RE_sub.push("&#x0*" + char_lowercase.charCodeAt(0).toString(16) + entityEnd);
            if(char_lowercase != char_uppercase)
                RE_sub.push("&#0*" + char_uppercase.charCodeAt(0) + entityEnd);   
                RE_sub.push("&#x0*" + char_uppercase.charCodeAt(0).toString(16) + entityEnd);
            
            RE_sub = "(?:" + RE_sub.join("|") + ")";
            RE_res += (charMap[char_lowercase] = RE_sub);
        
        return(ents[string] = RE_res);
    
    /*
      @name by
      @description  second argument for the replace function.
      */
    function by(match, group1, group2)
        /* Adds a data-prefix before every external pointer */
        return group1 + "data-" + group2 
    
    /*
      @name cr
      @description            Selects a HTML element and performs a
                                  search-and-replace on attributes
      @param String selector  HTML substring to match
      @param String attribute RegExp-escaped; HTML element attribute to match
      @param String marker    Optional RegExp-escaped; marks the prefix
      @param String delimiter Optional RegExp escaped; non-quote delimiters
      @param String end       Optional RegExp-escaped; forces the match to
                                  end before an occurence of <end> when 
                                  quotes are missing
     */
    function cr(selector, attribute, marker, delimiter, end)
        if(typeof selector == "string") selector = new RegExp(selector, "gi");
        marker = typeof marker == "string" ? marker : "\\s*=";
        delimiter = typeof delimiter == "string" ? delimiter : "";
        end = typeof end == "string" ? end : "";
        var is_end = end && "?";
        var re1 = new RegExp("("+att+")("+attribute+marker+"(?:\\s*\"[^\""+delimiter+"]*\"|\\s*'[^'"+delimiter+"]*'|[^\\s"+delimiter+"]+"+is_end+")"+end+")", "gi");
        html = html.replace(selector, function(match)
            return prefix + match.replace(re1, by);
        );
    
    /* 
      @name cri
      @description            Selects an attribute of a HTML element, and
                               performs a search-and-replace on certain values
      @param String selector  HTML element to match
      @param String attribute RegExp-escaped; HTML element attribute to match
      @param String front     RegExp-escaped; attribute value, prefix to match
      @param String flags     Optional RegExp flags, default "gi"
      @param String delimiter Optional RegExp-escaped; non-quote delimiters
      @param String end       Optional RegExp-escaped; forces the match to
                                  end before an occurence of <end> when 
                                  quotes are missing
     */
    function cri(selector, attribute, front, flags, delimiter, end)
        if(typeof selector == "string") selector = new RegExp(selector, "gi");
        flags = typeof flags == "string" ? flags : "gi";
         var re1 = new RegExp("("+att+attribute+"\\s*=)((?:\\s*\"[^\"]*\"|\\s*'[^']*'|[^\\s>]+))", "gi");

        end = typeof end == "string" ? end + ")" : ")";
        var at1 = new RegExp('(")('+front+'[^"]+")', flags);
        var at2 = new RegExp("(')("+front+"[^']+')", flags);
        var at3 = new RegExp("()("+front+'(?:"[^"]+"|\'[^\']+\'|(?:(?!'+delimiter+').)+)'+end, flags);

        var handleAttr = function(match, g1, g2)
            if(g2.charAt(0) == '"') return g1+g2.replace(at1, by);
            if(g2.charAt(0) == "'") return g1+g2.replace(at2, by);
            return g1+g2.replace(at3, by);
        ;
        html = html.replace(selector, function(match)
             return prefix + match.replace(re1, handleAttr);
        );
    

    /* <meta http-equiv=refresh content="  ; url= " > */
    html = html.replace(new RegExp("<meta"+any+att+"http-equiv\\s*=\\s*(?:\""+ae("refresh")+"\""+any+etag+"|'"+ae("refresh")+"'"+any+etag+"|"+ae("refresh")+"(?:"+ae(" ")+any+etag+"|"+etag+"))", "gi"), "<!-- meta http-equiv=refresh stripped-->");

    /* Stripping all scripts */
    html = html.replace(new RegExp("<script"+any+">\\s*//\\s*<\\[CDATA\\[[\\S\\s]*?]]>\\s*</script[^>]*>", "gi"), "<!--CDATA script-->");
    html = html.replace(/<script[\S\s]+?<\/script\s*>/gi, "<!--Non-CDATA script-->");
    cr(tag+any+att+"on[-a-z0-9:_.]+="+any+etag, "on[-a-z0-9:_.]+"); /* Event listeners */

    cr(tag+any+att+"href\\s*="+any+etag, "href"); /* Linked elements */
    cr(tag+any+att+"src\\s*="+any+etag, "src"); /* Embedded elements */

    cr("<object"+any+att+"data\\s*="+any+etag, "data"); /* <object data= > */
    cr("<applet"+any+att+"codebase\\s*="+any+etag, "codebase"); /* <applet codebase= > */

    /* <param name=movie value= >*/
    cr("<param"+any+att+"name\\s*=\\s*(?:\""+ae("movie")+"\""+any+etag+"|'"+ae("movie")+"'"+any+etag+"|"+ae("movie")+"(?:"+ae(" ")+any+etag+"|"+etag+"))", "value");

    /* <style> and < style=  > url()*/
    cr(/<style[^>]*>(?:[^"']*(?:"[^"]*"|'[^']*'))*?[^'"]*(?:<\/style|$)/gi, "url", "\\s*\\(\\s*", "", "\\s*\\)");
    cri(tag+any+att+"style\\s*="+any+etag, "style", ae("url")+s+ae("(")+s, 0, s+ae(")"), ae(")"));

    /* IE7- CSS expression() */
    cr(/<style[^>]*>(?:[^"']*(?:"[^"]*"|'[^']*'))*?[^'"]*(?:<\/style|$)/gi, "expression", "\\s*\\(\\s*", "", "\\s*\\)");
    cri(tag+any+att+"style\\s*="+any+etag, "style", ae("expression")+s+ae("(")+s, 0, s+ae(")"), ae(")"));
    return html.replace(new RegExp("(?:"+prefix+")+", "g"), prefix);

代码说明

sanitiseHTML 函数基于我的replace_all_rel_by_abs 函数（请参阅this answer）。不过，sanitiseHTML 函数已完全重写，以实现最大的效率和可靠性。

此外，还添加了一组新的正则表达式以删除所有脚本和事件处理程序（包括 CSS expression()、IE7-）。为确保按预期解析所有标签，调整后的标签以 为前缀。此前缀对于正确解析嵌套的“事件处理程序”以及未终止的引号是必需的：<a id="><input onclick="<div onmousemove=evil()>">。

这些正则表达式是使用内部函数 cr/cri 动态创建的（Create Replace [Inline]） .这些函数接受参数列表，并创建和执行高级 RE 替换。为了确保 HTML 实体不会破坏正则表达式（<meta http-equiv=refresh> 中的 refresh 可以用各种方式编写），动态创建的正则表达式部分由函数 ae 构造（Any 实体实体）。实际替换由函数by 完成（替换by）。在这个实现中，by 在所有匹配的属性之前添加了data-。

<script>//<[CDATA[ .. //]]></script>

CDATA

</script>

<script>...</script>

<meta http-equiv=refresh .. >

所有事件侦听器和外部指针/属性（href、src、url()）以data- 为前缀，如前所述。

创建了一个IFrame 对象。 IFrame 不太可能泄漏内存（与 htmlfile ActiveXObject 相反）。 IFrame 变得不可见，并附加到文档中，以便可以访问 DOM。 document.write() 用于将 HTML 写入 IFrame。 document.open() 和 document.close() 用于清空文档之前的内容，以便生成的文档是给定 html 字符串的精确副本。

first

document

second

doc

destroy

补充说明

将designMode 属性设置为“开”将阻止框架执行脚本（Chrome 不支持）。如果出于特定原因必须保留<script> 标签，您可以使用iframe.designMode = "On" 代替脚本剥离功能。我无法找到htmlfile activeXObject 的可靠来源。根据this source 的说法，htmlfile 比 IFrame 慢，并且更容易受到内存泄漏的影响。所有受影响的属性（href、src、...）都以data- 为前缀。获取/更改这些属性的示例如下所示：data-href:elem.getAttribute("data-href") 和 elem.setAttribute("data-href", "...")elem.dataset.href 和 elem.dataset.href = "..."。外部资源已被禁用。因此，页面可能看起来完全不同：~~<link rel="stylesheet" href="main.css" />~~ 没有外部样式~~<script>document.body.bgColor="red";</script>~~ 没有脚本样式<img src="128x128.png" /> 没有图片：元素的大小可能完全不同。

示例

sanitiseHTML(html) 将此小书签粘贴到该位置的栏中。它将提供一个注入文本区域的选项，显示已清理的 HTML 字符串。

javascript:void(function()var s=document.createElement("script");s.src="http://rob.lekensteyn.nl/html-sanitizer.js";document.body.appendChild(s))();

代码示例 - string2dom(html)：

string2dom("<html><head><title>Test</title></head></html>", function(doc, destroy)
    alert(doc.title); /* Alert: "Test" */
    destroy();
);

var test = string2dom("<div id='secret'></div>");
alert(test.doc.getElementById("secret").tagName); /* Alert: "DIV" */
test.destroy();

重要参考文献

SO: JS RE to change all relative to absolute URLs - 函数 sanitiseHTML(html) 是基于我之前创建的 replace_all_rel_by_abs(html) 函数。 Elements - Embedded content - 标准嵌入元素的完整列表 Elements - Previous HTML elements - （已弃用）元素的附加列表（例如 <applet>） The htmlfile ActiveX object - “比 iframe 沙箱慢。如果不管理会泄漏内存”

【讨论】：

我也+1。关于片段，我已经得出了相同的结论（经过广泛的研究和测试）。有趣的部分是将designMode 设置为on 以防止脚本执行。无论如何，非常感谢......这更像是我所追求的答案。唯一真正的遗憾是许多潜在的漏洞，所以我需要多考虑一下。参见第 3 点（+ 相应的替换功能）和最后的前两个引用。如果您绝对确定某个标签 (<applet>?) 不会出现，则无需实现它。如果您不必为特定目标保留嵌入式元素，则通过 RE 删除它们很容易。例如：.replace(/<object[\S\s]+?<\/object\s*>/gi, "")。一些嵌入的对象可能有一个省略的结束标签。在这种情况下，请使用：.replace(/<embed[^>]+>[\S\s]*?<\/embed\s*>/gi, "").replace(/<embed[^>]*>/gi, "")。 +1 很好的答案。修复<stylesheet> 和/或<style> 是否有价值？他们可能有expressions 或-moz-behaviors。 @Rob - jsfiddle.net/JFSKe/2 是对您的消毒剂的一次微不足道的攻击。我知道至少还有一种简单的方法可以打败它，而且我什至不是 XSS 专家。 @Rob - 您的代码现在似乎根本没有正确清理 * 属性。此输入 "<html><head><title>Test</title></head><body onload='alert(\"XSS\")'></html>" 显示“XSS”警报。我强烈建议您为自己构建一个非常全面的测试套件。【参考方案5】：

我不确定 IE 是否支持document.implementation.createHTMLDocument，但如果支持，请使用此算法（改编自我的DOMParser HTML extension）。请注意，不会保留 DOCTYPE。：

var
      doc = document.implementation.createHTMLDocument("")
    , doc_elt = doc.documentElement
    , first_elt
;
doc_elt.innerHTML = your_html_here;
first_elt = doc_elt.firstElementChild;
if ( // are we dealing with an entire document or a fragment?
       doc_elt.childElementCount === 1
    && first_elt.tagName.toLowerCase() === "html"
) 
    doc.replaceChild(first_elt, doc_elt);


// doc is an HTML document
// you can now reference stuff like doc.title, etc.

【讨论】：

IE 9 支持它，但不幸的是 IE 8 及更低版本不支持。【参考方案6】：

刚刚浏览了这个页面，有点晚了 :) 但是以下内容应该可以帮助将来遇到类似问题的任何人......但是现在应该忽略 IE7/8 并且有更好的更现代的浏览器支持的方法。

以下几乎适用于我测试过的所有东西 - 唯一的两个缺点是：

我已经在根 div 元素中添加了定制的 getElementById 和 getElementsByName 函数，所以这些函数不会像预期的那样出现在树的下方（除非修改代码以适应这个问题）.

doctype 将被忽略 - 但是我认为这不会有太大的不同，因为我的经验是 doctype 不会影响 dom 的结构，只是它的呈现方式这种方法会发生）。

基本上，系统依赖于 <tag> 和 <namespace:tag> 被用户代理区别对待的事实。正如已经发现的那样，某些特殊标签不能存在于 div 元素中，因此它们被删除。命名空间元素可以放置在任何地方（除非有 DTD 另有说明）。虽然这些命名空间标签实际上不会像真正的标签那样表现，但考虑到我们只是将它们用于它们在文档中的结构位置，它并不会真正造成问题。

标记和代码如下：

<!DOCTYPE html>
<html>
<head>
<script>

  /// function for parsing HTML source to a dom structure
  /// Tested in Mac OSX, Win 7, Win XP with FF, IE 7/8/9, 
  /// Chrome, Safari & Opera.
  function parseHTML(src)

    /// create a random div, this will be our root
    var div = document.createElement('div'),
        /// specificy our namespace prefix
        ns = 'faux:',
        /// state which tags we will treat as "special"
        stn = ['html','head','body','title'];
        /// the reg exp for replacing the special tags
        re = new RegExp('<(/?)('+stn.join('|')+')([^>]*)?>','gi'),
        /// remember the getElementsByTagName function before we override it
        gtn = div.getElementsByTagName;

    /// a quick function to namespace certain tag names
    var nspace = function(tn)
      if ( stn.indexOf ) 
        return stn.indexOf(tn) != -1 ? ns + tn : tn;
      
      else 
        return ('|'+stn.join('|')+'|').indexOf(tn) != -1 ? ns + tn : tn;
      
    ;

    /// search and replace our source so that special tags are namespaced
    /// &nbsp; required for IE7/8 to render tags before first text found
    /// <faux:check /> tag added so we can test how namespaces work
    src = '&nbsp;<'+ns+'check />' + src.replace(re,'<$1'+ns+'$2$3>');
    /// inject to the div
    div.innerHTML = src;
    /// quick test to see how we support namespaces in TagName searches
    if ( !div.getElementsByTagName(ns+'check').length ) 
      ns = '';
    

    /// create our replacement getByName and getById functions
    var createGetElementByAttr = function(attr, collect)
      var func = function(a,w)
        var i,c,e,f,l,o; w = w||[];
        if ( this.nodeType == 1 ) 
          if ( this.getAttribute(attr) == a ) 
            if ( collect ) 
              w.push(this);
            
            else 
              return this;
            
          
        
        else 
          return false;
        
        if ( (c = this.childNodes) && (l = c.length) ) 
          for( i=0; i<l; i++ )
            if( (e = c[i]) && (e.nodeType == 1) ) 
              if ( (f = func.call( e, a, w )) && !collect ) 
                return f;
              
            
          
        
        return (w.length?w:false);
      
      return func;
    

    /// apply these replacement functions to the div container, obviously 
    /// you could add these to prototypes for browsers the support element 
    /// constructors. For other browsers you could step each element and 
    /// apply the functions through-out the node tree... however this would  
    /// be quite messy, far better just to always call from the root node - 
    /// or use div.getElementsByTagName.call( localElement, 'tag' );
    div.getElementsByTagName = function(t)return gtn.call(this,nspace(t));
    div.getElementsByName    = createGetElementByAttr('name', true);
    div.getElementById       = createGetElementByAttr('id', false);

    /// return the final element
    return div;
  

  window.onload = function()

    /// parse the HTML source into a node tree
    var dom = parseHTML( document.getElementById('source').innerHTML );

    /// test some look ups :)
    var a = dom.getElementsByTagName('head'),
        b = dom.getElementsByTagName('title'),
        c = dom.getElementsByTagName('script'),
        d = dom.getElementById('body');

    /// alert the result
    alert(a[0].innerHTML);
    alert(b[0].innerHTML);
    alert(c[0].innerHTML);
    alert(d.innerHTML);

  
</script>
</head>
<body>
  <xmp id="source">
    <!DOCTYPE html>
    <html>
    <head>
      <!-- Comment //-->
      <meta charset="utf-8">
      <meta name="robots" content="index, follow">
      <title>An example</title>
      <link href="test.css" />
      <script>alert('of parsing..');</script>
    </head>
    <body id="body">
      <b>in a similar way to createDocumentFragment</b>
    </body>
    </html>
  </xmp>
</body>
</html>

【讨论】：

【参考方案7】：

使用完整的 HTML DOM 功能而不触发请求，无需处理不兼容性：

var doc = document.cloneNode();
if (!doc.documentElement) 
    doc.appendChild(doc.createElement('html'));
    doc.documentElement.appendChild(doc.createElement('head'));
    doc.documentElement.appendChild(doc.createElement('body'));

一切就绪！ doc是html文档，但是不在线。

【讨论】：

以上是关于我可以将整个 HTML 文档加载到 Internet Explorer 中的文档片段中吗？的主要内容，如果未能解决你的问题，请参考以下文章