使用 Python 和 Regex，如何从 html 中删除 <sup> 标签？ [复制]

Posted 2023-02-23

技术标签:

【中文标题】使用 Python 和 Regex，如何从 html 中删除 <sup> 标签？ [复制]【英文标题】：Using Python and Regex,How do you remove <sup> tags from html? [duplicate] 【发布时间】：2014-08-23 09:53:14 【问题描述】：

使用 python 正则表达式，我如何删除 html 中的所有标签？标签有时具有样式，如下所示：

<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>

我想删除更大的 html 字符串中的 sup 标签之间的所有内容。

【问题讨论】：

你的最终结果是什么？ OP 必读的正则表达式操作 HTML：***.com/a/1732454/3001761 我通过将 html 转换为字符串并使用以下内容解决了我的问题：re.sub(r'^{+','',string of html)} 【参考方案1】：

我会改用 HTML 解析器 (why)。例如，BeautifulSoup 和 unwrap() 可以处理您的美餐：

Tag.unwrap() 与 wrap() 相反。它将标签替换为该标签内的任何内容。这对剥离标记很有用。

from bs4 import BeautifulSoup

data = """
<div>
    <sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>
</div>
"""

soup = BeautifulSoup(data)
for sup in soup.find_all('sup'):
    sup.unwrap()

print soup.prettify()

打印：

<div>
(1)
</div>

【讨论】：

谢谢这更有效。我很感激。

以上是关于使用 Python 和 Regex，如何从 html 中删除 <sup> 标签？ [复制]的主要内容，如果未能解决你的问题，请参考以下文章