python BeautifulSoup4 获取 script 节点问题
Posted 我们分头打钱!
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python BeautifulSoup4 获取 script 节点问题相关的知识,希望对你有一定的参考价值。
在爬取12306站点名时发现,BeautifulSoup检索不到station_version的节点
因为script标签在</html>之外,如果用‘lxml’解析器会忽略这一部分,而使用html5lib则不会。
...
1 <!-- 购物车 --> 2 <div style="display: none;" class="buy-cart"><div class="cart-hd"><span class="num">0</span> 3 </div> 4 <div class="cart-bd" style="display: none;"><div class="cart-bd-top"><h3><span id="hbTrainDate">候补购票需求列表</span> 5 <a id="hbClear" href="javascript:void(0)" shape="rect">[清空]</a> 6 </h3> 7 <a href="javascript:void(0)" class="close" shape="rect">×</a> 8 </div> 9 <div class="cart-bd-con"><ul class="cart-tlist"></ul> 10 </div> 11 <div class="cart-bd-ft"><p class="cart-ft-tips">1、候补订单需求中可包含2个相邻乘车日期,每个乘车日期可包含2个不同“车次+席别”的组合需求。</p> 12 <p class="cart-ft-tips">2、排位是指您的订单在待兑现订单中的位置。当前排位仅供参考,实际排位以支付成功后为准。</p> 13 <a id="hbSubmit" href="javascript:void(0)" class="btn72 fr" shape="rect">添加乘客</a> 14 </div> 15 </div> 16 </div> 17 </body> 18 </html> # 用‘lxml’得到的汤到此为止 19 <script type="text/javascript" src="/otn/resources/js/framework/station_name.js?station_version=1.9115" xml:space="preserve"></script> 20 <script type="text/javascript" src="/otn/resources/js/framework/favorite_name.js" xml:space="preserve"></script> 21 <script type="text/javascript" src="/otn/resources/merged/queryLeftTicket_end_js.js?scriptVersion=1.9158" xml:space="preserve"></script>
...
1 >>> url = "https://kyfw.12306.cn/otn/leftTicket/init?linktypeid=dc&fs=%E4%B8%87%E5%B7%9E,WYW&ts=%E8%A5%BF%E5%AE%89,XAY&date=2019-11-05&flag=N,N,Y"
2 ... response = requests.get(url, timeout=10)
3 ... response.encoding = ‘utf-8‘
4 ... lxml = bs(response.text, ‘lxml‘)
5 ... html5lib = bs(response.text, ‘html5lib‘)
6 ... response.close()
7 >>> lxml.find_all(src=re.compile(".*station_version.*"))
8 []
9 >>> html5lib.find_all(src=re.compile(".*station_version.*"))
10 [<script src="/otn/resources/js/framework/station_name.js?station_version=1.9115" type="text/javascript" xml:space="preserve"></script>]
以上是关于python BeautifulSoup4 获取 script 节点问题的主要内容,如果未能解决你的问题,请参考以下文章
Python3 爬虫U11_BeautifulSoup4之select和CCS选择器提取元素