无法使用 JSoup 和 Java 保存修改后的 HTML 文件

Posted

技术标签:

【中文标题】无法使用 JSoup 和 Java 保存修改后的 HTML 文件【英文标题】:unable to save the modified HTML file using JSoup and Java 【发布时间】:2021-05-06 01:34:32 【问题描述】:

我需要修改现有的 html 文件并保存。虽然我可以修改但无法将更改保存回 HTML 文件。

我想要做的修改是我试图从两个表中删除一个表<table class='runtime-table table-striped table'>,该表位于 html 代码下方并希望保存更改

下面是html代码的sn-p

<!DOCTYPE html>
<html>
<head>
    <meta charset='UTF-8' /> 
    <meta name='description' content='' />
    <meta name='robots' content='noodp, noydir' />
    <meta name='viewport' content='width=device-width, initial-scale=1' />
    <meta id="timeStampFormat" name="timeStampFormat" content='MMM d, yyyy hh:mm:ss a'/>
    
        <link href='https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,600' rel='stylesheet' type='text/css' />
        <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet" />
        <link href='https://cdn.jsdelivr.net/gh/extent-framework/extent-github-cdn@ff53917fbbdb5ef820abbbe4d199a6942dc771ff/v3html/css/extent.css' type='text/css' rel='stylesheet' />
    
    <title>HALS Brands</title>

    <style type='text/css'>
        /* json-tree */
        .jstBracket,.jstComma,.jstValuewhite-space:pre-wrap.jstValuefont-size:10px;font-weight:400;font-family:"Lucida Console",Monaco,monospace.jstPropertycolor:#666;word-wrap:break-word.jstBoolcolor:#2525CC.jstNumcolor:#D036D0.jstNullcolor:gray.jstStrcolor:#2DB669.jstFold:aftercontent:' -';cursor:pointer.jstExpandwhite-space:normal.jstExpand:aftercontent:' +';cursor:pointer.jstFoldedwhite-space:normal!important.jstHiddenBlockdisplay:none
            
      
        
        img.r-img width:60%;     
        
      
    
    </style>
    
    <script type="text/javascript">
        /*! json-tree - v0.2.2 - 2017-09-25, MIT LICENSE */
        var JSONTree=function()var n="&":"&amp;","<":"&lt;",">":"&gt;",'"':"&quot;","'":"&#x27;","/":"&#x2F;",t=0,r=0;this.create=function(n,t)return r+=1,N(u(n,0,!1),class:"jstValue");var e=function(t)return t.replace(/[&<>'"]/g,function(t)return n[t]),s=function()return r+"_"+t++,u=function(n,t,r)if(null===n)return f(r?t:0);switch(typeof n)case"boolean":return l(n,r?t:0);case"number":return i(n,r?t:0);case"string":return o(n,r?t:0);default:return n instanceof Array?a(n,t,r):c(n,t,r),c=function(n,t,r)var e=s(),u=Object.keys(n).map(function(r)return j(r,n[r],t+1,!0)).join(m()),c=[g("",r?t:0,e),N(u,id:e),p("",t)].join("\n");return N(c,),a=function(n,t,r)var e=s(),c=n.map(function(n)return u(n,t+1,!0)).join(m());return[g("[",r?t:0,e),N(c,id:e),p("]",t)].join("\n"),o=function(n,t)var r=e(JSON.stringify(n));return N(v(r,t),class:"jstStr"),i=function(n,t)return N(v(n,t),class:"jstNum"),l=function(n,t)return N(v(n,t),class:"jstBool"),f=function(n)return N(v("null",n),class:"jstNull"),j=function(n,t,r)var s=v(e(JSON.stringify(n))+": ",r),c=N(u(t,r,!1),);return N(s+c,class:"jstProperty"),m=function()return N(",\n",class:"jstComma"),N=function(n,t)return d("span",t,n),d=function(n,t,r)return"<"+n+Object.keys(t).map(function(n)return" "+n+'="'+t[n]+'"').join("")+">"+r+"</"+n+">",g=function(n,t,r)return N(v(n,t),class:"jstBracket")+N("",class:"jstFold",onclick:"JSONTree.toggle('"+r+"')");this.toggle=function(n)var t=document.getElementById(n),r=t.parentNode,e=t.previousElementSibling;""===t.className?(t.className="jstHiddenBlock",r.className="jstFolded",e.className="jstExpand"):(t.className="",r.className="",e.className="jstFold");var p=function(n,t)return N(v(n,t),),v=function(n,t)return Array(2*t+1).join(" ")+n;return this();
    </script>
</head>
    <body class='extent standard default hide-overflow bdd-report'>
        <div id='theme-selector' alt='Click to toggle theme. To enable by default, use theme configuration.' title='Click to toggle theme. To enable by default, use theme configuration.'>
            <span><i class='material-icons'>desktop_windows</i></span>
        </div>
<nav>
    <div class="nav-wrapper">
        <a href="#!" class="brand-logo black"><img src="https://cdn.rawgit.com/extent-framework/extent-github-cdn/d74480e/commons/img/logo.png"></a>
        <!-- slideout menu -->
        <ul id='slide-out' class='side-nav fixed hide-on-med-and-down'>
            <li class='waves-effect active'><a href='#!' view='test-view' onclick="configureView(0);chartsView('test');"><i class='material-icons'>dashboard</i></a></li>
                        <li class='waves-effect'><a href='#!' view='category-view' onclick="configureView(1)"><i class='material-icons'>label_outline</i></a></li>
            <li class='waves-effect'><a href='#!' view='exception-view' onclick="configureView(2)"><i class='material-icons'>bug_report</i></a></li>
            <li class='waves-effect'><a href='#!' onclick="configureView(-1);chartsView('dashboard');" view='dashboard-view'><i class='material-icons'>track_changes</i></a></li>
        </ul>
        <!-- report name -->
        <span class='report-name'>HALS Brands - Automation Report</span>
        <!-- report headline -->
        <span class='report-headline'></span>
        <!-- nav-right -->
        <ul id='nav-mobile' class='right hide-on-med-and-down nav-right'>
            <a href='#!'>
            <span class='label blue darken-3 suite-start-time'>Feb 2, 2021 01:32:39 AM</span>
            </a>
        </ul>
    </div>
</nav>      <!-- container -->
        <div class='container'>
<div id='test-view' class='view'>
    <section id='controls'>
        <div class='controls grey lighten-4'>
            <!-- test toggle -->
            <div class='chip transparent'>
                <a class='dropdown-button tests-toggle' data-activates='tests-toggle' data-constrainwidth='true' data-beloworigin='true' data-hover='true' href='#'>
                <i class='material-icons'>warning</i> Status
                </a>
                <ul id='tests-toggle' class='dropdown-content'>
                                        <li status='pass'><a href='#!'>Pass <i class='material-icons green-text'>check_circle</i></a></li>
                    <li status='fail'><a href='#!'>Fail <i class='material-icons red-text'>cancel</i></a></li>
                    <li status='skip'><a href='#!'>Skip <i class='material-icons cyan-text'>redo</i></a></li>
                    <li class='divider'></li>
                    <li status='clear' clear='true'><a href='#!'>Clear Filters <i class='material-icons'>clear</i></a></li>
                </ul>
            </div>
            <!-- test toggle -->
            <!-- category toggle -->
            <div class='chip transparent'>
                <a class='dropdown-button category-toggle' data-activates='category-toggle' data-constrainwidth='false' data-beloworigin='true' data-hover='true' href='#'>
                <i class='material-icons'>local_offer</i> Category
                </a>
                <ul id='category-toggle' class='dropdown-content'>
                    <li><a href='#'>@sanity_api1bal</a></li>
                    <li class='divider'></li>
                    <li class='clear'><a href='#!' clear='true'>Clear Filters</a></li>
                </ul>
            </div>
            <!-- category toggle -->
            <!-- clear filters -->
            <div class='chip transparent hide'>
                <a class='' id='clear-filters' alt='Clear Filters' title='Clear Filters'>
                <i class='material-icons'>close</i> Clear
                </a>
            </div>
            <!-- clear filters -->
            <!-- enable dashboard -->
            <div id='toggle-test-view-charts' class='chip transparent'>
                <a class='pink-text' id='enable-dashboard' alt='Enable Dashboard' title='Enable Dashboard'>
                <i class='material-icons'>track_changes</i> Dashboard
                </a>
            </div>
            <!-- enable dashboard -->
            <!-- search -->
            <div class='chip transparent' alt='Search Tests' title='Search Tests'>
                <a href="#" class='search-div'>
                <i class='material-icons'>search</i> Search
                </a>
                <div class='input-field left hide'>
                    <input id='search-tests' type='text' class='validate browser-default' placeholder='Search Tests...'>
                </div>
            </div>
            <!-- search -->
        </div>
    </section>
<div id='test-view-charts' class='subview-full'>
    <div id='charts-row' class='row nm-v nm-h'>
        <div class='col s12 m4 l4 np-h'>
            <div class='card-panel nm-v'>
                <div class='left panel-name'>Features</div>
                <div class='chart-box' style="max-height:94px;">
                    <canvas id='parent-analysis' width='90' height='70'></canvas>
                </div>
                <div class='block text-small'>
                    <span class='tooltipped' data-position='top' data-tooltip='0%'><span class='strong'>0</span> feature(s) passed</span>
                </div>
                <div class='block text-small'>
                    <span class='strong tooltipped' data-position='top' data-tooltip='100%'>1</span> feature(s) failed, <span class='strong tooltipped' data-position='top' data-tooltip='0%'>0</span> skipped
                </div>
            </div>
        </div>
        <div class='col s12 m4 l4 np-h'>
            <div class='card-panel nm-v'>
                <div class='left panel-name'>Scenarios</div>
                <div class='chart-box' style="max-height:94px;">
                    <canvas id='child-analysis' width='90' height='70'></canvas>
                </div>
                <div class='block text-small'>
                    <span class='tooltipped' data-position='top' data-tooltip='0%'><span class='strong'>0</span> scenario(s) passed</span>
                </div>
                <div class='block text-small'>
                    <span class='strong tooltipped' data-position='top' data-tooltip='100%'>2</span> scenario(s) failed, 
                    <span class='strong tooltipped' data-position='top' data-tooltip='0%'>0</span> skipped, 
                    <span class='strong tooltipped' data-position='top' data-tooltip='0%'>0</span> others
                </div>
            </div>
        </div>
        <div class='col s12 m4 l4 np-h'>
            <div class='card-panel nm-v'>
                <div class='left panel-name'>Steps</div>
                <div class='chart-box' style="max-height:94px;">
                    <canvas id='grandchild-analysis' width='90' height='70'></canvas>
                </div>
                <div class='block text-small'>
                    <span class='tooltipped' data-position='top' data-tooltip='57.143%'><span class='strong'>8</span> step(s) passed</span>
                </div>
                <div class='block text-small'>
                    <span class='strong tooltipped' data-position='top' data-tooltip='14.286%'>2</span> scenario(s) failed, 
                    <span class='strong tooltipped' data-position='top' data-tooltip='28.571%'>4</span> skipped, 
                    <span class='strong tooltipped' data-position='top' data-tooltip='28.571%'>0</span> others
                </div>
            </div>
        </div>
    </div>
    <div id="timeline-chart" class="row nm-v nm-h">
        <div class="col s12 m12 l12 np-h">
            <div class="card-panel">
                <div class='left panel-name'>Timeline (seconds)</div>
                <div class="chart-box" style="width:98%;max-height:145px;">
                    <canvas id="timeline" ></canvas>
                </div>
            </div>
        </div>
    </div>
</div>  <div class='subview-left left'>
        <div class='view-summary'>
            <ul id='test-collection' class='test-collection'>
                <li class='test displayed active has-leaf fail' status='fail' bdd='true' test-id='1'>
                    <div class='test-heading'>
                        <span class='test-name'>E2E_Services_Sanity_TC_01_ProductAPI_PRODUCT_All Products_Active_Tier_Price</span>
                        <span class='test-time'>Feb 2, 2021 01:32:41 AM</span>
                        <span class='test-status right fail'>fail</span>
                    </div>
                    <div class='test-content hide'>
<div class="sr-filters bdd-filters">
    <a class="btn-floating waves-effect waves-light pass green" title="pass"><i class='material-icons'>check_circle</i></a>
    <a class="btn-floating waves-effect waves-light fail red" title="fail"><i class='material-icons'>cancel</i></a>
    <a class="btn-floating waves-effect waves-light skip blue" title="skip"><i class='material-icons'>redo</i></a>
    <a class="btn-floating waves-effect waves-light clear grey" title="clear"><i class='material-icons'>clear</i></a>
</div>
<div class='scenario outline node' test-id='2' status='fail'>
    <span class='duration right label'>0h 0m 12s+489ms</span>
    <div class="bdd-test">
        <div class="scenario-name"><span class='status fail' title='fail'><i class='material-icons'>cancel</i></span> Scenario Outline: Verify Product API is retrieving activeTierAndPrices for particular location</div>
  <table class='runtime-table table-striped table'><tr><td>brand</td><td>locationId</td><td>sellingChannel</td><td>productId</td><td>responseCode</td><td>responseFileName</td><td>productId_DB</td></tr><tr><td>HALS</td><td>8023</td><td>WEBOA</td><td>HALS-WEBOA-640213214</td><td>200</td><td>E2E_Services_Sanity_TC01_Response.json</td><td>HALS-STORE-640213214</td></tr><tr><td>HALS</td><td>1700</td><td>WEBOA</td><td>HALS-WEBOA-640213214</td><td>200</td><td>E2E_Services_Sanity_TC01_Response.json</td><td>HALS-STORE-640213214</td></tr></table>
<table class='runtime-table table-striped table'><tr><td>brand</td><td>locationId</td><td>sellingChannel</td><td>productId</td><td>responseCode</td><td>responseFileName</td><td>productId_DB</td></tr><tr><td>HALS</td><td>8023</td><td>WEBOA</td><td>HALS-WEBOA-640213214</td><td>200</td><td>E2E_Services_Sanity_TC01_Response.json</td><td>HALS-STORE-640213214</td></tr><tr><td>HALS</td><td>1700</td><td>WEBOA</td><td>HALS-WEBOA-640213214</td><td>200</td><td>E2E_Services_Sanity_TC01_Response.json</td><td>HALS-STORE-640213214</td></tr></table>
    </div>
    <ul class='steps'>
        <li test-id='3' class='node scenario fail' status='fail'>
            <div class="step-name" title=""><span class='status fail' title='fail'><i class='material-icons'>cancel</i></span>Verify Product API is retrieving activeTierAndPrices for particular location</div>
            <ul class='gc steps'>

我的 Java 代码遍历 Html。我成功删除了第二个表,但未能将更改保存回文件

    File htmlFile = new File("C:\\Automation_Report\\Extent_HTML_Reports\\ExtentHtml.html");
    
    //PrintWriter writer = new PrintWriter(htmlFile,"UTF-8");
    
    Document document = Jsoup.parse(htmlFile, "UTF-8");
    
    Elements element1 = document.getElementsByClass("runtime-table table-striped table");
    
    System.err.println(" Size : "+element1.size());
    System.out.println("");
    
    System.out.println("Before Deleting");
    System.out.println("===========================================");
    System.out.println(element1);

    if(element1.size()>1)
    
        for(int i=0;i<=element1.size();i++)
        
            if(i==1)
            
                //element1.get(i).empty();
                element1.remove(1);
            
        
    
    
    System.out.println("Before Deleting");
    System.out.println("===========================================");
    System.out.println(element1);
            
    Path output = Path.of("C:\\Automation_Report\\Extent_HTML_Reports\\filename1.html");
    Files.writeString(output, document.html());
    

【问题讨论】:

【参考方案1】:

element1.remove(1);将从元素中删除元素

element1.get(1).remove();将从文档中删除元素。

请根据你的设置文件的位置。

public class JSOUPExample 
   public static void main(String[] args) 
    File htmlFile = new File("D:\\ExtentHtml.html");
    try        
        Document document = Jsoup.parse(htmlFile, "UTF-8");
        
        Elements element1 = document.getElementsByClass("runtime-table table-striped table");
        
        System.err.println(" Size : "+element1.size());
        System.out.println("");
        
        System.out.println("Before Deleting");
        System.out.println("===========================================");
        System.out.println(element1);
        
        
        
        /*if(element1.size()>1)
        
            for(int i=0;i<=element1.size();i++)
            
                if(i==1)
                
                    //element1.get(i).empty();
                    element1.remove(1);
                
            
        */
        element1.get(1).remove();
        
        System.out.println("After Deleting");
        System.out.println("===========================================");
        System.out.println(element1);
                
        Path output = Path.of("D:\\filename1.html");
        Files.writeString(output, document.html()  );
        

     catch (IOException e) 
        // TODO Auto-generated catch block
        e.printStackTrace();
    

更多信息请查看here

【讨论】:

感谢您的信息。但我认为这与您说我尝试过但未能将更改保存到文件中的逻辑相同。 不,我评论了你的代码。并替换为 element1.get(1).remove();。 element1.remove(1) 之间有区别;和 element1.get(1).remove();.. 现在在我上面的代码更改保存在 html 文件中,它正在工作,请检查它。

以上是关于无法使用 JSoup 和 Java 保存修改后的 HTML 文件的主要内容,如果未能解决你的问题,请参考以下文章

采集baidu搜索信息的java源代码实现(大部分转发,少量自己修改)(使用了htmlunit和Jsoup)(转发:https://blog.csdn.net/zhaohang_1/article/d

保存壁纸太麻烦?教你如何用Java快速获取网站图片

Jsoup在极少数情况下无法解析元素

Jsoup 配置和使用

求使用java语言抓取sina,搜狐网站上的新闻资讯的源码,或者原理说明也可~

java爬虫Jsoup简单学习