R爬虫案例
Posted jessepeng
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R爬虫案例相关的知识,希望对你有一定的参考价值。
爬取豆瓣相册
library(RCurl)
library(XML)
myHttpheader <- c("User-Agent"="Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.6) ",
"Accept"="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language"="en-us",
"Connection"="keep-alive",
"Accept-Charset"="GB2312,utf-8;q=0.7,*;q=0.7")
ye<-c(1,seq(18,630,18))
info<-NULL
for(i in ye)
url<-paste("https://www.douban.com/photos/album/50903114/?start=",i,sep="")
web<-getURL(url,httpheader=myHttpheader)
doc<- htmlTreeParse(web,encoding="UTF-8", error=function(...), useInternalNodes = TRUE,trim=TRUE)
node<-getNodeSet(doc, "//div[@class='photo_wrap']/a")
info=c(info,sapply(node,xmlGetAttr,"href"))
x<-1
dir.create("./image1/")
for(urlweb in info)
web1<-getURL(urlweb,httpheader=myHttpheader)
doc1<- htmlTreeParse(web1,encoding="UTF-8", error=function(...), useInternalNodes = TRUE,trim=TRUE)
node1<-getNodeSet(doc1, "//div[@class='photo-edit']/a")
info1=sapply(node1,xmlGetAttr,"href")
web2<-getURL(info1,httpheader=myHttpheader)
doc2<- htmlTreeParse(web2,encoding="UTF-8", error=function(...), useInternalNodes = TRUE,trim=TRUE)
node2<-getNodeSet(doc2, "//td[@id='pic-viewer']/a/img")
info2=sapply(node2,xmlGetAttr,"src")
y<-paste("./image1/",x,".jpg")
tryCatch(
download.file(info2,y,mode="wb")
x<-x+1,error=function(e)
cat("ERROR:",conditionMessage(e),"\n")
print("loser"))
以上是关于R爬虫案例的主要内容,如果未能解决你的问题,请参考以下文章