不能写出三个链接的所有结果,而只能写最后一个链接的结果
Posted
技术标签:
【中文标题】不能写出三个链接的所有结果,而只能写最后一个链接的结果【英文标题】:Can't write all the results from three links instead of the results from the last link only 【发布时间】:2022-01-15 18:06:45 【问题描述】:我编写了一个宏来从website 的三页中抓取一些字段。我使用Array()
来存储和写入结果,以使执行更快一点。
只要关注单个页面的内容,脚本就可以正常运行。但是,当我使用列表中的三个链接时出现问题。 To be specific, the script overwrites previous results
。例如,我应该在执行后得到 150 个结果。相反,我从最后一个链接得到 50 个结果。
到目前为止我已经写了:
Public Sub FetchData()
Dim Xhr As Object, html As HTMLDocument, Ws As Worksheet
Dim Link As Variant, Links As Variant, LeadInfo() As String
Dim I&, HtmlDoc As HTMLDocument, Listings As Object, Headers()
Dim URLS(), N As Variant
Links = Array( _
"https://***.com/questions/tagged/web-scraping?tab=newest&page=1&pagesize=50", _
"https://***.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50", _
"https://***.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50" _
)
Set Ws = ThisWorkbook.Worksheets("Sheet1")
Set Xhr = CreateObject("MSXML2.XMLHTTP")
Set Html = New HTMLDocument
Set HtmlDoc = New HTMLDocument
Headers = Array("Title", "URL", "User", "Asked")
For Each Link In Links
With Xhr
.Open "GET", Link, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Set Listings = Html.querySelectorAll(".summary")
ReDim LeadInfo(1 To Listings.Length, 1 To 4)
On Error Resume Next
For I = 0 To Listings.Length - 1
HtmlDoc.body.innerHTML = Listings.item(I).innerHTML
LeadInfo(I + 1, 1) = HtmlDoc.querySelector(".question-hyperlink").innerText
LeadInfo(I + 1, 2) = HtmlDoc.querySelector(".question-hyperlink").getAttribute("href")
LeadInfo(I + 1, 3) = HtmlDoc.querySelector(".user-details > a").innerText
LeadInfo(I + 1, 4) = HtmlDoc.querySelector(".user-action-time > span.relativetime").innerText
Next I
On Error GoTo 0
If IsEmpty(Ws.Cells(1, 1).Value) Then Ws.Cells(1, 1).Resize(1, UBound(Headers) + 1) = Headers
Ws.Cells(2, 1).Resize(UBound(LeadInfo, 1), UBound(LeadInfo, 2)) = LeadInfo
Next Link
End Sub
我怎样才能写出三个链接的所有结果,而不是只写最后一个链接的结果?
【问题讨论】:
使用Ws.Cells(2, 1)
中的变量,即Ws.Cells(k+2, 1)
。在 UBound(LeadInfo)
之后增加 k
。
请查看编辑@CDP1802。
我的评论看不到任何变化
你说的很对。你的建议确实解决了问题。感谢一万亿。
【参考方案1】:
您已经有了页面大小(即每页的最大结果)、页数和标题大小。只需对一个数组进行尺寸标注以存储结果并将其写出一次。比重复 ReDim 更有效,ReDim 复制一个数组,然后写出会产生 I/O。
使用变量来跟踪要填充到数组中的行。
将你的写作移到循环之外。
声明Listings As MSHTML.IHTMLDOMChildrenCollection
以便提供更新的 Excel 版本(至少向后兼容 2010)。
Option Explicit
Public Sub FetchData()
Dim Xhr As Object, Html As MSHTML.HTMLDocument, Ws As Worksheet
Dim Link As Variant, Links() As Variant, LeadInfo() As String
Dim I As Long, HtmlDoc As MSHTML.HTMLDocument, Listings As MSHTML.IHTMLDOMChildrenCollection
Dim Headers() As Variant
Links = Array( _
"https://***.com/questions/tagged/web-scraping?tab=newest&page=1&pagesize=50", _
"https://***.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50", _
"https://***.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50" _
)
Set Ws = ThisWorkbook.Worksheets("Sheet1")
Set Xhr = CreateObject("MSXML2.XMLHTTP")
Set Html = New HTMLDocument
Set HtmlDoc = New HTMLDocument
Headers = Array("Title", "URL", "User", "Asked")
ReDim LeadInfo(1 To (UBound(Links) + 1) * 50, 1 To UBound(Headers) + 1) 'size according to headers and page size
Dim rowNumber As Long
For Each Link In Links
With Xhr
.Open "GET", Link, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
.send
Html.body.innerHTML = .responseText
End With
Set Listings = Html.querySelectorAll(".summary")
On Error Resume Next
For I = 0 To Listings.Length - 1
rowNumber = rowNumber + 1
HtmlDoc.body.innerHTML = Listings.Item(I).innerHTML
LeadInfo(rowNumber, 1) = HtmlDoc.querySelector(".question-hyperlink").innerText
LeadInfo(rowNumber, 2) = Replace$(HtmlDoc.querySelector(".question-hyperlink").href, "about:", "https://***.com")
LeadInfo(rowNumber, 3) = HtmlDoc.querySelector(".user-details > a").innerText
LeadInfo(rowNumber, 4) = HtmlDoc.querySelector(".user-action-time > span.relativetime").innerText
Next I
On Error GoTo 0
Next Link
If IsEmpty(Ws.Cells(1, 1).Value) Then Ws.Cells(1, 1).Resize(1, UBound(Headers) + 1) = Headers
Ws.Cells(2, 1).Resize(UBound(LeadInfo, 1), UBound(LeadInfo, 2)) = LeadInfo
End Sub
【讨论】:
以上是关于不能写出三个链接的所有结果,而只能写最后一个链接的结果的主要内容,如果未能解决你的问题,请参考以下文章
请教大虾JSP链接数据库的代码怎样写,最好可以把代码写出来给我看看,我驱动已经连好了
不能只删除链接列表中的最后一个元素! (所有其他具有特定值的元素都会被删除)