ganon抓取网页示例

Posted swocn

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ganon抓取网页示例相关的知识,希望对你有一定的参考价值。

项目地址: http://code.google.com/p/ganon/
文档: http://code.google.com/p/ganon/w/list

这个功能强大的很,使用类似js的标签选择器识别DOM

The Ganon library gives access to html/XML documents in a very simple object oriented way. It eases modifying the DOM and makes finding elements easy with CSS3-like queries.

Ganon 使用示例:

// Parse the google code website into a DOM
$html = file_get_dom(‘http://code.google.com/‘);

Access
Accessing elements is made easy through the CSS3-like selectors and the object model.

// Find all the paragraph tags with a class attribute and print the
 // value of the class attribute
 foreach($html(‘p[class]‘) as $element) {
   echo $element->class, "<br>\n"; 
 }
 
 // Find the first div with ID "gc-header" and print the plain text of
 // the parent element (plain text means no HTML tags, just the text)
 echo $html(‘div#gc-header‘, 0)->parent->getPlainText();
 
 // Find out how many tags there are which are "ns:tag" or "div", but not
 // "a" and do not have a class attribute
 echo count($html(‘(ns|tag, div + !a)[!class]‘);
?>

Modification
Elements can be easily modified after you‘ve found them.

// Find all paragraph tags which are nested inside a div tag, change
     // their ID attribute and print the new HTML code
     foreach($html(‘div p‘) as $index => $element) {
       $element->id = "id$index";
     }
     echo $html;
 
 
     // Center all the links inside a document which start with "http://"
     // and print out the new HTML
     foreach($html(‘a[href ^= "http://"]‘) as $element) {
       $element->wrap(‘center‘);
     }
     echo $html;
 
 
     // Find all odd indexed "td" elements and change the HTML to make them links
     foreach($html(‘table td:odd‘) as $element) {
       $element->setInnerText(‘<a href="#">‘.$element->getPlainText().‘</a>‘);
     }
     echo $html;

 

Beautify
Ganon can also help you beautify your code and format it properly.

// Beautify the old HTML code and print out the new, formatted code
     dom_format($html, array(‘attributes_case‘ => CASE_LOWER));
     echo $html;

 

以上是关于ganon抓取网页示例的主要内容,如果未能解决你的问题,请参考以下文章

http协议请求实战——get请求示例之抓取百度搜索关键词对应网页信息

使用 Node.js 进行网页抓取

我们如何导航到一个网页,抓取数据,移动到下一页,然后再做一次?

python怎么抓取豆瓣电影url

vb.net如何访问https网页

Ruby中可用的网页抓取宝石/工具[关闭]