Java中解析HTML数据 (利用第三方库Jsoup)
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Java中解析HTML数据 (利用第三方库Jsoup)相关的知识,希望对你有一定的参考价值。
需求分析:
在为网页服务提取API时需要解析页面中的信息
项目地址: https://github.com/hwding/LibXDUQuery
准备工作:
- 下载第三方库Jsoup(一款非常优秀的html Parser): https://jsoup.org/download
- 阅读Jsoup API Reference: https://jsoup.org/apidocs/
- 查阅相关代码
- 了解JAVA解析XML的方式(具有异曲同工之妙): http://www.cnblogs.com/hwding/p/5519713.html
网页源代码:
1 <HTML> 2 <HEAD> 3 <title>物理实验网络选课系统</title> 4 <meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=gb2312"> 5 <meta content="Microsoft Visual Studio .NET 7.1" name="GENERATOR"> 6 <meta content="C#" name="CODE_LANGUAGE"> 7 <meta content="javascript" name="vs_defaultClientScript"> 8 <meta content="http://schemas.microsoft.com/intellisense/ie5" name="vs_targetSchema"> 9 <link href="../style.css" type="text/css" rel="stylesheet"> 10 11 </HEAD> 12 <body> 13 14 <TABLE cellSpacing="0" cellPadding="0" width="100%" border="0"> 15 <TR> 16 <TD width="560" nowrap><img src="/PhyEws/img/banleft.gif"></TD> 17 <td width="258" align="left" nowrap> 18 <table cellSpacing="0" cellPadding="0" width="100%" border="0"> 19 <tr> 20 <TD nowrap background="/PhyEws/img/banback.gif" valign="bottom" align=left height=90 width="258" > 21 <br> 22 <br> 23 <div style="FONT-WEIGHT:bold;FONT-SIZE:18pt;COLOR:#666666;FONT-FAMILY:宋体"> 24 某某大学物理实验教学示范中心 25 26 </div> 27 <br> 28 </TD> 29 </tr> 30 </table> 31 </td> 32 <td align="center" valign="bottom"> 33 </td> 34 </TR> 35 </TABLE> 36 <table width="100%" border="0" cellspacing="0" cellpadding="0" align="center" class="localtoolbar"> 37 <tr style="PADDING-TOP: 2px" height="30" align="center" class="lt0"> 38 <td class="lt0" nowrap> 39 <font color="#000099"> 40 2016年5月25日 第13周 星期三 41 </font> 42 <a href="student.aspx" class="menu">首页</a> | 43 <a href="select.aspx" class="menu"> 查询已选实验</a> | 44 <a href="addexpe.aspx" class="menu">开始选课</a> | 45 <a href="del.aspx" class="menu">取消选课</a> | 46 <a href="statsel.aspx" class="menu">查询可选单元</a> | 47 <a href="msg.aspx" class="menu">给教师留言</a> | 48 <a href="course.aspx" class="menu">课程安排</a> | 49 <a href="expeinfo.aspx" class="menu">实验项目查询</a> | 50 <a href="chgstupwd.aspx" class="menu">修改密码</a> | 51 <a href="modperinfo.aspx" class="menu">修改个人信息</a> | 52 <a href="../logoff.aspx" class="menu">退出</a> 53 </td> 54 </tr> 55 </table> 56 57 <TABLE id="Table1" cellSpacing="0" cellPadding="0" width="100%" border="0" height="72%"> 58 <TR> 59 <TD valign="top" nowrap> 60 <form name="Form1" method="post" action="select.aspx" id="Form1"> 61 <div> 62 <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJODg3NTE5NTY0ZGTrVCA59FM5ZwvQdjZCh3bbd3Y15Q==" /> 63 </div> 64 65 <div> 66 67 <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="CF4774E0" /> 68 </div> 69 <br> 70 <span style=‘MARGIN-LEFT:50px‘><a href=‘detgrade.aspx‘ >查询各实验的具体成绩及出勤情况</a></span><br/> 71 72 73 <br> 74 <div align="center"> 75 某某某 76 同学, 77 <span id="Introscore" style="color:Red;">您的绪论课成绩为:尚未录入</span> 78 <BR> 79 您已选的实验如下:<br> 80 (注:“归一成绩”栏为期末计算实验综合成绩时,对各实验成绩进行归一化处理后所得成绩) 81 <br> 82 <br> 83 <span id="Orders"><table id="Orders_ctl00" cellspacing="0" rules="all" border="1" style="width:900px;border-collapse:collapse;"> 84 <tr> 85 <th class="tableHeaderText" align="left" style="height:25px;width:15px;">序号</th><th class="tableHeaderText" align="left" style="height:25px;width:280px;">实验项目</th><th class="tableHeaderText" align="left" style="height:25px;width:30px;">实验周次</th><th class="tableHeaderText" align="left" style="height:25px;width:100px;">实验时间</th><th class="tableHeaderText" align="left" style="height:25px;width:80px;">实验日期</th><th class="tableHeaderText" align="left" style="height:25px;width:35px;">上课教室</th><th class="tableHeaderText" align="center" style="height:25px;width:80px;">讲义出处</th><th class="tableHeaderText" align="center" style="height:25px;width:70px;">实验成绩</th><th class="tableHeaderText" align="center" style="height:25px;width:35px;">归一成绩</th><th class="tableHeaderText" align="center" style="height:25px;">备注</th> 86 </tr><tr> 87 <td class="forumRow" style="height:25px;"><span>1</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">绪论F322(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>2</span></td><td class="forumRow" style="height:25px;"><span>星期五晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>3/11/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F322</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new"></a></td><td class="forumRow" style="height:25px;"><span>95</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span></span></td> 88 </tr><tr> 89 <td class="forumRow" style="height:25px;"><span>2</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">利用牛顿环测量平凸透镜曲率半径(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>4</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>3/24/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F321</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">综合设计性物理实验</a></td><td class="forumRow" style="height:25px;"><span>95</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第89页</span></td> 90 </tr><tr> 91 <td class="forumRow" style="height:25px;"><span>3</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">复摆测量重力加速度实验(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>5</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>3/31/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F323</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>85</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第49页</span></td> 92 </tr><tr> 93 <td class="forumRow" style="height:25px;"><span>4</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">组装式直流双臂电桥测量低电阻(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>7</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>4/14/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F220</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>80</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第136页</span></td> 94 </tr><tr> 95 <td class="forumRow" style="height:25px;"><span>5</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">线性、非线性电阻及二极管伏安特性的测定(3学时)</a></td><td class="forumRow" align="center" style="height:25px;"><span>8</span></td><td class="forumRow" style="height:25px;"><span>星期四晚上18:30-21:30</span></td><td class="forumRow" style="height:25px;"><span>4/21/2016</span></td><td class="forumRow" align="center" style="height:25px;"><span>F220</span></td><td class="forumRow" style="height:25px;"><a class="linkSmallBold" target="_new">基础物理实验</a></td><td class="forumRow" style="height:25px;"><span>70</span></td><td class="forumRow" style="height:25px;"><span></span></td><td class="forumRow" style="height:25px;"><span>第141页</span></td> 96 </tr><tr> 以上是关于Java中解析HTML数据 (利用第三方库Jsoup)的主要内容,如果未能解决你的问题,请参考以下文章一起Talk Android吧(第三百五十八回:Gson库解析Java对象)
初触Python,关于pyquery解析html(百度贴吧)