将粘贴内容复制到 TinyMCE 输入会导致 HTML 臃肿

Posted 2023-03-17

技术标签:

【中文标题】将粘贴内容复制到 TinyMCE 输入会导致 HTML 臃肿【英文标题】：Copy&Paste content into TinyMCE input results in bloated HTML 【发布时间】：2011-07-23 14:14:30 【问题描述】：

我将 TinyMCE 用于各种项目。我遇到的问题是 ATM 是很多用户复制并粘贴来自 Word 或 OpenOffice 等源的内容到 TinyMCE 输入中。这通常会导致代码臃肿（例如，像 <span lang="EN-GB"> 这样的东西被 OpenOffice 接管了）。 TinyMCE 的清理似乎并没有删除这些标签。有没有办法在文本粘贴到 TinyMCE 输入区域之前删除所有格式？还是有其他方法可以防止这种臃肿的代码，例如通过 php 过滤服务器端？

【问题讨论】：

【参考方案1】：

我知道这是一个 opd 问题，但为了其他可能正在寻找这个答案的人的利益（就像我一样！），TinyMCE 现在包括控制粘贴到文本框中的内容的能力。

在初始化调用中，添加“粘贴”插件，然后设置您需要的任何选项，例如

tinyMCE.init(
    ...
    plugins: "paste",
    paste_auto_cleanup_on_paste : true,
    paste_remove_styles: true,
    paste_remove_styles_if_webkit: true,
    paste_strip_class_attributes: "all",
    paste_remove_spans : true,
    ...
);

您可以在tinyMCE wiki中看到所有选项

【讨论】：

【参考方案2】：

我使用以下 c 函数 on_preprocess 删除所有标签：

strip_tags = function (str, allowed_tags) 
    var key = '', allowed = false;
    var matches = [];    var allowed_array = [];
    var allowed_tag = '';
    var i = 0;
    var k = '';
    var html = ''; 
    var replacer = function (search, replace, str) 
        return str.split(search).join(replace);
    ;
     // Build allowes tags associative array
    if (allowed_tags) 
        allowed_array = allowed_tags.match(/([a-zA-Z0-9]+)/gi);
    
     str += '';

    // Match tags
    matches = str.match(/(<\/?[\S][^>]*>)/gi);
     // Go through all HTML tags
    for (key in matches) 
        if (isNaN(key)) 
            // IE7 Hack
            continue;        

        // Save HTML tag
        html = matches[key].toString();
         // Is tag not in allowed list? Remove from str!
        allowed = false;

        // Go through all allowed tags
        for (k in allowed_array)             // Init
            allowed_tag = allowed_array[k];
            i = -1;

            if (i != 0)  i = html.toLowerCase().indexOf('<'+allowed_tag+'>');           
            if (i != 0)  i = html.toLowerCase().indexOf('<'+allowed_tag+' ');
            if (i != 0)  i = html.toLowerCase().indexOf('</'+allowed_tag)   ;

            // Determine
            if (i == 0)                 allowed = true;
                break;
            
        
         if (!allowed) 
            str = replacer(html, "", str); // Custom replace. No regexing
        
    

     return str;
;

在我放置的 tinymce 初始化中

paste_preprocess : function(pl, o) 

// remove Clipboard header on MAC
var pos_sel = o.content.search("EndSelection:");
var pos_fra = o.content.search("EndFragment:");
var mac_header_found = false;

if (o.content.search("Version:") == 0 && pos_sel < 135 && pos_sel > 120)
    o.content = o.content.substring(pos_sel+23);
    mac_header_found = true;

else if (o.content.search("Version:") == 0 && pos_fra < 80 && pos_fra > 75)
    o.content = o.content.substring(pos_fra+23);
    mac_header_found = true;


// Copy from Word oder OpenOffice (MAC) - remove header
if (o.wordContent || mac_header_found) 
    // first style tag + content to be removed
    var pos_start_style = o.content.search('<style');
    var pos_end_style = o.content.search('</style>');
    if (pos_start_style > 0 && pos_end_style > pos_start_style) 
        o.content = o.content.substring(0, pos_start_style).concat(o.content.substring(pos_end_style + 8));
    
    // complete Worddokument gets pasted
    else 
        var pos_start_p = o.content.search('<p');
        if (pos_start_p) o.content = o.content.substring(pos_start_p);
    


    o.content = ir.im.strip_tags( o.content, '' );

// NO-Break Zero-width space if empty
if (o.content == '') 
    o.content = '&#65279;';
   
,

【讨论】：

看起来不错。必须尝试一下。所以基本上，如果我做对了，我们定义了一个过滤所有非白名单 HTML 元素的函数，该函数通过 tinyMCE init 调用。还是我做错了什么？它不是通过 init 调用的，而是使用 init 定义粘贴时调用的函数（预处理）

以上是关于将粘贴内容复制到 TinyMCE 输入会导致 HTML 臃肿的主要内容，如果未能解决你的问题，请参考以下文章