试验一下Golang 网络爬虫框架gocolly/colly
Posted pu369
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了试验一下Golang 网络爬虫框架gocolly/colly相关的知识,希望对你有一定的参考价值。
参考:http://www.cnblogs.com/majianguo/p/8186429.html
框架源码在 github.com/gocolly/colly
代码如下(源码中的demo)
package main import ( "fmt" "github.com/gocolly/colly" ) func main() { // Instantiate default collector c := colly.NewCollector( // Visit only domains: hackerspaces.org, wiki.hackerspaces.org colly.AllowedDomains("hackerspaces.org", "wiki.hackerspaces.org"), ) // On every a element which has href attribute call callback c.OnHTML("a[href]", func(e *colly.HTMLElement) { link := e.Attr("href") // Print link fmt.Printf("Link found: %q -> %s ", e.Text, link) // Visit link found on page // Only those links are visited which are in AllowedDomains c.Visit(e.Request.AbsoluteURL(link)) }) // Before making a request print "Visiting ..." c.OnRequest(func(r *colly.Request) { fmt.Println("Visiting", r.URL.String()) }) // Start scraping on https://hackerspaces.org c.Visit("https://hackerspaces.org/") }
结果Ctrl-B后,提示了类似于cannot find package "github.com/PuerkitoBio/goquery" in any of:等一堆内容,对照提示用gopm逐一下载相应的依赖包,这时候真希望能用go get啊
以上是关于试验一下Golang 网络爬虫框架gocolly/colly的主要内容,如果未能解决你的问题,请参考以下文章
Golang 网络爬虫框架gocolly/colly 二 jQuery selector