使用PHP5和XPath轻松地进行抓取和HTML解析

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用PHP5和XPath轻松地进行抓取和HTML解析相关的知识,希望对你有一定的参考价值。

This example uses file_get_contents to retrieve remote html. From there, we can parse through it using php5's DOMDocument and DOMXpath. XPath Queries are easy to create using the Firefox extension "XPather"
  1. <?php
  2. //a URL you want to retrieve
  3. $my_url = 'http://www.digg.com';
  4. $html = file_get_contents($my_url);
  5. $dom = new DOMDocument();
  6. $dom->loadHTML($html);
  7. $xpath = new DOMXPath($dom);
  8.  
  9. //Put your XPath Query here
  10. $my_xpath_query = "/html/body/div[@id='container']/div[@id='contents']/div[@class='list' and @id='wrapper']/div[@class='main' and position()=1]/div[contains(@class, 'news-summary')]/div[@class='news-body']/h3";
  11. $result_rows = $xpath->query($my_xpath_query);
  12.  
  13. //here we loop through our results (a DOMDocument Object)
  14. foreach ($result_rows as $result_object){
  15. echo $result_object->childNodes->item(0)->nodeValue;
  16. }
  17. ?>

以上是关于使用PHP5和XPath轻松地进行抓取和HTML解析的主要内容,如果未能解决你的问题,请参考以下文章

Xpath用法

Python lxml包下面的xpath基本用法

python学习(24) 使用Xpath解析并抓取美女图片

尝试使用 BeautifulSoup 从我的代码中使用 Xpath 进行网络抓取 [重复]

利用python脚本(xpath)抓取数据

XPath Helper的安装与使用