在 HTML 标记的文本内容中查找单词/文本并用突出显示标记替换匹配项的可靠方法是啥？

Posted 2023-03-05

技术标签:

【中文标题】在 HTML 标记的文本内容中查找单词/文本并用突出显示标记替换匹配项的可靠方法是啥？【英文标题】：What are reliable approaches for finding words/text within an HTML-markup's text-content and replacing the matches with highlighting markup?在 HTML 标记的文本内容中查找单词/文本并用突出显示标记替换匹配项的可靠方法是什么？ 【发布时间】：2021-07-11 20:39:41 【问题描述】：

我有一些文字。而且我有一个接收单词或短语的函数，我必须返回相同的文本，但在这个关键字或短语周围有一个类。

例子：

如果我有这个

text = <a href="/redirect?uri=https%3A%2F%2Fwww.website.com&context=post" target="_blank" rel="noopener noreferrer">https://www.website.com</a>

我想要

text = <a href="/redirect?uri=https%3A%2F%2Fwww.website.com&context=post" target="_blank" rel="noopener noreferrer">https://www.<span class="bold">website</span>.com</a>

但我得到的是

text = <a href="/redirect?uri=https%3A%2F%2Fwww.<span class="bold"> website </span>.com&amp;context=post" target="_blank" rel="noopener noreferrer">https://www.<span class="bold"> website </span>.com</a>

我正在做的是

        ...
        const escapedPhrases = ["\\bwebsite\\b"]
        const regex = new RegExp(`($escapedPhrases.join('|'))`, 'gi');
        text = text.replace(
          regex,
          '<span class="bold"> $1 </span>'
        );

如何改进我的正则表达式？

我还尝试在替换  $1  后“清理”文本，如果它在 href 内但没有成功，则尝试将其删除。

更新澄清：

我有这段文字：

text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com</a>

Thanks!`

示例 1：我想突出显示twitter这个词：

为此，我想添加一个带有 bold 类的跨度，例如在 twitter 周围：

text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.<span class="bold">twitter</span>.com</a>

Thanks!`

示例 2：我要突出显示twitter.com这个词：

为此，我想添加一个带有bold 类的跨度，例如在twitter.com 周围：

text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.<span class="bold">twitter.com</span></a>

Thanks!`

示例 3：我要突出显示https://twitter.com/这个词：

为此，我想添加一个带有bold 类的跨度，例如在https://twitter.com/ 周围：

text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer"><span class="bold">https://www.twitter.com</span></a>

Thanks!`

示例 4：

我有这段文字，想突出显示twitter：

text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com</a>

Thanks for follow my twitter!`

那我得回去了

text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.<span class="bold">twitter</span>.com</a>

Thanks for follow my <span class="bold">twitter</span>!`

【问题讨论】：

@WiktorStribiżew 没错！我的错，我更新了我的帖子！你的输入是字符串还是元素？我的输入是一个可以有很多单词的字符串，中间有一个链接，如示例中的链接。我必须在这个字符串中搜索一个关键词。在大多数情况下，它可以正常工作，但如果关键字恰好是链接中的一个词，例如本例中的 website 或 website.com 或 https://www.website.com/，则链接会中断。所以你只需要替换a标签内的website关键字？如果你在外面找到website，你不碰它吗？请提供更完整的样本以便更好地理解不，我不想碰它的唯一地方是链接的 href 内。我想在单词周围添加跨度的所有其他位置，例如  website  或  website.com  或  https://www.website.com/ ，取决于我收到的关键字。 【参考方案1】：

正则表达式不是解决所有问题的方法，在这种情况下，只修改 textContent 而不是 attribute 也许下面的代码会满足您的需求：

let text = `Follow me on 
<a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com</a>

Thanks for follow my twitter!`;

const replaceKeyword = (keyword, text) => 
  let template = document.createElement('template');
  template.innerhtml = text;
  let children = template.content.childNodes;
  
  let str = '';
  let substitute = `<span style='color:red;font-weight:bold;'>$keyword</span>`;
  for (let child of children)
    if (child.nodeType === 3)
      // #text
      str += child.textContent.replace(keyword, substitute);
     else if (child.nodeType === 1) 
      // element
      let nodeStr = child.textContent.replace(keyword, substitute);
      child.innerHTML = nodeStr;
      str += child.outerHTML;
    
  
  return str;


let result = replaceKeyword('twitter', text);
console.log(result);
document.body.innerHTML = result;

【讨论】：

【参考方案2】：

随着添加到要求中的最新功能，OP 彻底改变了游戏规则。现在有人谈论在 html-markup 的文本内容中进行全文搜索。

类似于...的东西

How to highlight the search-result of a text-query within an html document ignoring the html tags? Markdown-like functionality for tooltips ... 或 ... 如何从 DOM 中查询文本节点、查找降价模式、用 HTML 标记替换匹配项以及用新内容替换原始文本节点？ What is a good enough approach for writing real-time text search and highlight functionality which does not break the order of text- and element-nodes

...最后两个提供不同但通用的基于 DOM 节点/文本节点的方法。

至于 OP 的问题。对于像在 html 代码的文本内容中查找文本查询这样的要求，人们不能坚持一个简单的解决方案。现在必须假设嵌套标记。

在每个搜索结果周围提供/添加特殊标记必须首先从必须从传递的 html 代码中解析的 DOM 片段中收集每个文本节点。

有了这样的基础，就不能再随便使用基于正则表达式的String.replace。现在必须用不匹配的文本内容替换/重新组合每个与搜索查询部分匹配的文本节点，并且由于附加标记而现在更改为元素节点的部分匹配文本.

因此，仅从 OP 的最后一个需求更改开始，就必须提供一种通用的全文搜索和突出显示方法，此外还必须考虑到这一点并在提供的搜索查询中清理/处理空白序列和正则表达式特定字符...

// node detection helpers.
function isElementNode(node) 
  return (node && (node.nodeType === 1));

function isNonEmptyTextNode(node) 
  return (
        node
    && (node.nodeType === 3)
    && (node.nodeValue.trim() !== '')
    && (node.parentNode.tagName.toLowerCase() !== 'script')
  );


// dom node render helper.
function insertNodeAfter(node, referenceNode) 
  const  parentNode, nextSibling  = referenceNode;
    if (nextSibling !== null) 

    node = parentNode.insertBefore(node, nextSibling);
   else 
    node = parentNode.appendChild(node);
  
  return node;


// text node reducer functionality.
function collectNonEmptyTextNode(list, node) 
  if (isNonEmptyTextNode(node)) 
    list.push(node);
  
  return list;

function collectTextNodeList(list, elmNode) 
  return Array.from(
    elmNode.childNodes
  ).reduce(
    collectNonEmptyTextNode,
    list
  );

function getTextNodeList(rootNode) 
  rootNode = (isElementNode(rootNode) && rootNode) || document.body;

  const elementNodeList = Array.from(
    rootNode.getElementsByTagName('*')
  );
  elementNodeList.unshift(rootNode);

  return elementNodeList.reduce(collectTextNodeList, []);



// search result emphasizing functinality.

function createSearchMatch(text) 
  const elmMatch = document.createElement('strong');

  // elmMatch.classList.add("bold");
  elmMatch.textContent = text;

  return elmMatch;

function aggregateSearchResult(collector, text, idx) 
  const  previousNode, regXSearch  = collector;

  const currentNode = regXSearch.test(text)
    ? createSearchMatch(text)
    : document.createTextNode(text);

  if (idx === 0) 
    previousNode.parentNode.replaceChild(currentNode, previousNode);
   else 
    insertNodeAfter(currentNode, previousNode);
  
  collector.previousNode = currentNode;

  return collector;

function emphasizeTextContentMatch(textNode, regXSearch) 
  // console.log(regXSearch);
  textNode.textContent
    .split(regXSearch)
    .filter(text => text !== '')
    .reduce(aggregateSearchResult, 
      previousNode: textNode,
      regXSearch,
    )



function emphasizeEveryTextContentMatch(htmlCode, searchValue, isIgnoreCase) 
  searchValue = searchValue.trim();
  if (searchValue !== '') 

    const replacementNode = document.createElement('div');
    replacementNode.innerHTML = htmlCode;

    const regXSearchString = searchValue
      // escaping of regex specific characters.
      .replace((/[.*+?^$()|[\]\\]/g), '\\$&')
      // additional escaping of whitespace (sequences).
      .replace((/\s+/g), '\\s+');

    const regXFlags = `g$ !!isIgnoreCase ? 'i' : '' `;
    const regXSearch = RegExp(`($ regXSearchString )`, regXFlags);

    getTextNodeList(replacementNode).forEach(textNode =>
      emphasizeTextContentMatch(textNode, regXSearch)
    );
    htmlCode = replacementNode.innerHTML
  
  return htmlCode;



const htmlLinkList = [
  emphasizeEveryTextContentMatch(
    'Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a> Thanks!',
    'twitter'
  ),
  emphasizeEveryTextContentMatch(
    'Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a> Thanks!',
    'twitter.com'
  ),
  emphasizeEveryTextContentMatch(
    'Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a> Thanks!',
    'https://www.twitter.com/'
  ),
  emphasizeEveryTextContentMatch(
    'Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a> Thanks for follow my Twitter!',
    'TWITTER',
    true
  ),
  emphasizeEveryTextContentMatch(
    `Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a>
    Thanks
    for follow 
    my   Twitter!`,
    'follow my twitter',
    true
  ),
];
document.body.innerHTML = htmlLinkList.join('<br/>');

const container = document.createElement('code');

container.textContent = emphasizeEveryTextContentMatch(
  'Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a> Thanks for follow my Twitter!',
  'TWITTER',
  true
);
document.body.appendChild(container.cloneNode(true));

container.textContent = emphasizeEveryTextContentMatch(
  `Follow me on <a href="/redirect?uri=https%3A%2F%2Fwww.twitter.com&context=post" target="_blank" rel="noopener noreferrer">https://www.twitter.com/</a>
  Thanks
  for follow 
  my   Twitter!`,
  'follow my twitter',
  true
);
document.body.appendChild(container.cloneNode(true));

code 
  display: block;
  margin: 10px 0;
  padding: 0

a strong 
  font-weight: bold;

.as-console-wrapper  min-height: 100%!important; top: 0;

【讨论】：

以上是关于在 HTML 标记的文本内容中查找单词/文本并用突出显示标记替换匹配项的可靠方法是啥？的主要内容，如果未能解决你的问题，请参考以下文章