au3抓取糗事百科网站
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了au3抓取糗事百科网站相关的知识,希望对你有一定的参考价值。
au3抓取糗事百科网站
网址:‘http://www.qiushibaike.com/8hr/page/‘ & $pagenum & ‘?s=4512150‘
#include <IE.au3> #include <File.au3> #include <String.au3> #include <Array.au3> #include <Debug.au3> #include <Date.au3> ;code try to collect Qiushibaike stories in qiushibaike.com Local $strUrl1 = "http://www.qiushibaike.com/8hr/page/2?s=4512150" Local $filename1 = "qiushibaike" $filename1 = $filename1 & ‘_‘ & @MON $filename1 = $filename1 & @MDAY $filename1 = $filename1 & ‘.txt‘ Local $filesave = @TempDir & "\qb.html" Local $pageindex Local $startindex = 2 Local $endindex = 10 Local $sHTML Local $storycount = 0 _FileCreate($filename1) Local $file = FileOpen($filename1, 1) If $file = -1 Then MsgBox(0, "Error", "Unable to open file.") Exit EndIf For $pageindex = $startindex To $endindex Step 1 $strUrl1 = MakeUpUrl($pageindex) Local $hDownload = InetGet($strUrl1, $filesave, 1, 1) Do Sleep(250) Until InetGetInfo($hDownload, 2) Local $nBytes = InetGetInfo($hDownload, 0) InetClose($hDownload) ConsoleWrite ($pageindex & ‘/‘ & $endindex &" --- down bytes = " &$nBytes & @LF) $fsize = $nBytes ;ConsoleWrite($pageindex & ‘- filesize = ‘& $fsize & @LF) $ftemp = FileOpen($filesave, 0) $getsize= FileGetSize ($filesave) $sHTML = FileRead($ftemp, $getsize) FileClose($ftemp) FileDelete($filesave) Local $aArray = StringRegExp($sHTML, ‘(?<=<span>)\n+[^/]+\n+(?=</span>)‘, 3) ConsoleWrite(" array size = " & UBound($aArray) & @CRLF) For $i = 0 To (UBound($aArray) - 1) Step 1 Local $item = $aArray[$i] If StringLen($item) > 0 Then $strnum = $storycount +1 $strnum = $strnum & "." &@CRLF FileWrite($file, $strnum) $storycontent = StringReplace($item, @LF, ‘‘) $storycontent = $storycontent & @CRLF FileWrite($file, $storycontent) $storycount = $storycount + 1 EndIf Next Next FileClose($file) MsgBox(0, "QSBK", "Complete, story count = "&$storycount & ‘, story=‘ & $filename1) Exit Func MakeUpUrl($pagenum) $strUrl = ‘http://www.qiushibaike.com/8hr/page/‘ & $pagenum & ‘?s=4512150‘ return $strUrl EndFunc
以上是关于au3抓取糗事百科网站的主要内容,如果未能解决你的问题,请参考以下文章
scrapy实战4抓取ajax动态页面(以糗事百科APP为例子):