如何使用标签的ID剥离标签及其所有内部html？

Posted 2023-01-24

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何使用标签的ID剥离标签及其所有内部html？相关的知识，希望对你有一定的参考价值。

我有以下html：

<html>
 <body>
 bla bla bla bla
  <div id="myDiv"> 
         more text
      <div id="anotherDiv">
           And even more text
      </div>
  </div>

  bla bla bla
 </body>
</html>

我想删除从<div id="anotherDiv">到结束<div>的所有内容。我该怎么办？

答案

带有native DOM

$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[@id="anotherDiv"]');
if($nodes->item(0)) 
    $nodes->item(0)->parentNode->removeChild($nodes->item(0));

echo $dom->saveHTML();

另一答案

您可以像使用preg_replace()：

$string = preg_replace('/<div id="someid"[^>]+\>/i', "", $string);

另一答案

使用本地XML Manipulation Library

假设您的html内容存储在变量$ html中：

$html='<html>
 <body>
 bla bla bla bla
  <div id="myDiv"> 
         more text
      <div id="anotherDiv">
           And even more text
      </div>
  </div>

  bla bla bla
 </body>
</html>';

要通过ID删除标签，请使用以下代码：

    $dom=new DOMDocument;

    $dom->validateOnParse = false;

    $dom->loadHTML( $html );

    // get the tag

    $div = $dom->getElementById('anotherDiv');

   // delete the tag

    if( $div && $div->nodeType==XML_ELEMENT_NODE )

        $div->parentNode->removeChild( $div );
    

    echo $dom->saveHTML();

请注意，某些版本的libxml需要使用doctype才能使用getElementById方法。

在这种情况下，您可以在$ html前面加上<!doctype>

$html = '<!doctype>' . $html;

或者，按照戈登答案的建议，您可以使用DOMXPath使用xpath查找元素：

$dom=new DOMDocument;

$dom->validateOnParse = false;

$dom->loadHTML( $html );

$xp=new DOMXPath( $dom );

$col = $xp->query( '//div[ @id="anotherDiv" ]' );

if( !empty( $col ) )

    foreach( $col as $node )

        $node->parentNode->removeChild( $node );

    



echo $dom->saveHTML();

另一答案

strip_tags（）函数就是您想要的。

http://us.php.net/manual/en/function.strip-tags.php

另一答案

我写这些来剥离特定的标签和属性。由于它们是正则表达式，因此不能保证在所有情况下都能100％工作，但这对我来说是一个公平的权衡：

// Strips only the given tags in the given HTML string.
function strip_tags_blacklist($html, $tags) 
    foreach ($tags as $tag) 
        $regex = '#<\s*' . $tag . '[^>]*>.*?<\s*/\s*'. $tag . '>#msi';
        $html = preg_replace($regex, '', $html);
    
    return $html;


// Strips the given attributes found in the given HTML string.
function strip_attributes($html, $atts) 
    foreach ($atts as $att) 
        $regex = '#\b' . $att . '\b(\s*=\s*[\'"][^\'"]*[\'"])?(?=[^<]*>)#msi';
        $html = preg_replace($regex, '', $html);
    
    return $html;

另一答案

怎么样？

// Strips only the given tags in the given HTML string.
function strip_tags_blacklist($html, $tags) 
    $html = preg_replace('/<'. $tags .'\b[^>]*>(.*?)<\/'. $tags .'>/is', "", $html);
    return $html;

另一答案

按照RafaSashi使用preg_replace()的回答，这是适用于单个标签或标签数组的版本：

/**
 * @param $str string
 * @param $tags string | array
 * @return string
 */

function strip_specific_tags ($str, $tags) 
  if (!is_array($tags))  $tags = array($tags); 

  foreach ($tags as $tag) 
    $_str = preg_replace('/<\/' . $tag . '>/i', '', $str);
    if ($_str != $str) 
      $str = preg_replace('/<' . $tag . '[^>]*>/i', '', $_str);
    
  
  return $str;

另一答案

按照RafaSashi使用preg_replace()的回答，这是适用于单个标签或标签数组的版本：

/**
 * @param $str string
 * @param $tags string | array
 * @return string
 */

function strip_specific_tags ($str, $tags) 
  if (!is_array($tags))  $tags = array($tags); 

  foreach ($tags as $tag) 
    $_str = preg_replace('/<\/' . $tag . '>/i', '', $str);
    if ($_str != $str) 
      $str = preg_replace('/<' . $tag . '[^>]*>/i', '', $_str);
    
  
  return $str;

另一答案

按照RafaSashi使用preg_replace()的回答，这是适用于单个标签或标签数组的版本：

/** * @param $str string * @param $tags string | array * @return string */ function strip_specific_tags ($str, $tags) if (!is_array($tags)) $tags = array($tags); foreach ($tags as $tag) $_str = preg_replace('/<\/' . $tag . '>/i', '', $str); if ($_str != $str) $str = preg_replace('/<' . $tag . '[^>]*>/i', '', $_str); return $str;

以上是关于如何使用标签的ID剥离标签及其所有内部html？的主要内容，如果未能解决你的问题，请参考以下文章