Java对html标签的过滤和清洗
Posted 骑着龙的羊
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Java对html标签的过滤和清洗相关的知识,希望对你有一定的参考价值。
OWASP html Sanitizer 是一个简单快捷的java类库,主要用于放置XSS
优点如下:
1.使用简单。不需要繁琐的xml配置,只用在代码中少量的编码
2.由Mike Samuel(谷歌工程师)维护
3.通过了AntiSamy超过95%的UT覆盖
4.高性能,低内存消耗
5.是 AntiSamy DOM性能的4倍
1.POM中增加
<!--html标签过滤--> <dependency> <groupId>com.googlecode.owasp-java-html-sanitizer</groupId> <artifactId>owasp-java-html-sanitizer</artifactId> <version>r136</version> </dependency>
2.工具类
import org.owasp.html.ElementPolicy; import org.owasp.html.HtmlPolicyBuilder; import org.owasp.html.PolicyFactory; import java.util.List; /** * @author : RandySun * @date : 2018-10-08 10:32 * Comment : */ public class HtmlUtils { //允许的标签 private static final String[] allowedTags = {"h1", "h2", "h3", "h4", "h5", "h6", "span", "strong", "img", "video", "source", "blockquote", "p", "div", "ul", "ol", "li", "table", "thead", "caption", "tbody", "tr", "th", "td", "br", "a" }; //需要转化的标签 private static final String[] needTransformTags = {"article", "aside", "command","datalist","details","figcaption", "figure", "footer","header", "hgroup","section","summary"}; //带有超链接的标签 private static final String[] linkTags = {"img","video","source","a"}; public static String sanitizeHtml(String htmlContent){ PolicyFactory policy = new HtmlPolicyBuilder() //所有允许的标签 .allowElements(allowedTags) //内容标签转化为div .allowElements( new ElementPolicy() { @Override public String apply(String elementName, List<String> attributes){ return "div"; } },needTransformTags) .allowAttributes("src","href","target").onElements(linkTags) //校验链接中的是否为http .allowUrlProtocols("https") .toFactory(); String safeHTML = policy.sanitize(htmlContent); return safeHTML; } public static void main(String[] args){ String inputHtml = "<img src="https://a.jpb"/>"; System.out.println(sanitizeHtml(inputHtml)); } }
其中.allowElements(allowedTags)是添加所有允许的html标签,
以下是需要转化的标签,把needTransformTags中的内容全部转化为div
//内容标签转化为div
.allowElements( new ElementPolicy() {
@Override
public String apply(String elementName, List<String> attributes){
return "div";
}
},needTransformTags)
.allowAttributes("src","href","target").onElements(linkTags)是在特定的标签上允许的属性
.allowUrlProtocols("https")表示href或者src链接中只允许https协议
以上是关于Java对html标签的过滤和清洗的主要内容,如果未能解决你的问题,请参考以下文章