VBA如何批量抓取数据
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了VBA如何批量抓取数据相关的知识,希望对你有一定的参考价值。
如题,某个名为“3月”的文件夹内有约400个excle文件,如何提取sheet名为“进料检验报表”中C3/E3/G3/C4/E4/C5这6个表格中的内容?并汇总到一个表格里面?
参考技术A Sub 按钮1_Click()Dim url, html
n = 1
url = "http://www.zjcredit.gov.cn:8000/ListQuery.aspx"
For j = 1 To 5 '这里控制查询的页数
pd = "sectionID=02" _
& "&sortField=CreditID" _
& "&sortDirection=1" _
& "&recordTotal=3151" _
& "&pageNo=" & j _
& "&pageLength=20" _
& "&isOpen=False&isIntermediary=False" _
& "&query_AreaCode=" _
& "&query_OrganizationCode=" _
& "&query_BusinessLicense=" _
& "&query_CorporationName=" _
& "&query_LegalRepresentative=" _
& "&query_BusinessScope=" _
& "&query_PromptSymbol=D"
pd1 = "query_AreaCode=&query_OrganizationCode=&query_BusinessLicense=&query_CorporationName=&query_LegalRepresentative=&query_BusinessScope=&query_PromptSymbol=d&queryTitle=&businessLicense=&actionType=&searchType=§ionID=02&hot=&returnFunction=parent.reset_queryTitles&query2_AreaCode=0&query2_BusinessLicense=&query2_CorporationName=&query2_OrganizationCode=&query2_LegalRepresentative=&validateTextbox="
Set html = CreateObject("htmlfile")
With CreateObject("msxml2.xmlhttp.6.0")
.Open "post", url, False
.setrequestheader "Content-Type", "application/x-www-form-urlencoded"
.send (pd)
html.body.innerhtml = .responsetext
Set tr = html.all.tags("tr")
For i = 0 To tr.Length - 1
If tr(i).bgcolor = "#ffffff" Or tr(i).bgcolor = "#f3f3f3" Then
n = n + 1
Cells(n, 1) = tr(i).ChildNodes(0).innertext
Cells(n, 2) = tr(i).ChildNodes(1).innertext
End If
Next
End With
Next
End Sub
Sub 按钮2_Click()
Range("a2:b65536").ClearContents
End Sub 参考技术B sub extractdata()
dim c3value,g3value,c4value,e4value,c5value
excelfilename = Dir("E:\3月\" & "*.xls")
Do While excelfilename <> ""
s = "E:\3月\" & excelfilename
Workbooks.Open Filename:=s
Workbooks(excelfilename).Activate
activeworkbook.sheets("进料检验报表").activate
c3value=activesheet(3,"C")
g3value=activesheet(3,"G")
c4value=activesheet(4,"C")
e4value=activesheet(4,"E")
c5value=activesheet(5,"C")
workbooks(合并到的excel).activate
activesheet.cells(1,1)="这是表头"
activesheet.cells(activesheet.usedrange.rows.count+1,1)=c3value
activesheet.cells(activesheet.usedrange.rows.count+1,2)=g3value
activesheet.cells(activesheet.usedrange.rows.count+1,3)=c4value
activesheet.cells(activesheet.usedrange.rows.count+1,4)=e4value
activesheet.cells(activesheet.usedrange.rows.count+1,5)=c5value
(excelfilename).Close savechanges:=false
excelfilename = Dir '第二次读入的时候不用写参数
Loop
end sub追问
说语法错误,什么情况啊。
使用vba中的动态数组函数从网站抓取数据
【中文标题】使用vba中的动态数组函数从网站抓取数据【英文标题】:Scraping data from website with dynamic array function in vba 【发布时间】:2022-01-14 19:39:56 【问题描述】:我想了解更多有关从网站抓取数据时如何应用数组函数的信息。我目前正在使用这个 vba 从网站复制数据。该代码可以抓取我想要的数据,但是在将数据复制到目标工作表时,它将所有数据复制到A1
单元格。由于这个 vba 是为我之前的项目开发的并且工作正常,我不确定哪个部分出了问题。
Sub CopyFromHKAB()
Dim ie As Object, btnmore As Object, tbl As Object
Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
ThisWorkbook.Sheets("data").UsedRange.Clear
Set ie = CreateObject("internetexplorer.application")
With ie
.Visible = True
.navigate "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0"
Do
DoEvents
Loop While .readyState <> 4 Or .Busy
Set tbl = .document.getElementsByClassName("etxtmed")(2)
End With
'get data from table
r = tbl.Rows.Length - 1
c = tbl.Rows(0).Cells.Length - 1
ReDim arr(0 To r, 0 To c)
Set rr = tbl.Rows
For i = 0 To r
Set cc = rr(i).Cells
For j = 0 To c
arr(i, j) = cc(j).innertext
Next
Next
ie.Quit
Application.ScreenUpdating = False
ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(r + 1, c + 1) = arr
With ThisWorkbook.Sheets("data")
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End Sub
【问题讨论】:
【参考方案1】:鉴于它们是嵌套的,因此您需要选择正确的表,因此将索引更改为 3。否则,您将选择共享父项,因此所有列表实际上都在一个子元素中,因此是您当前的输出。
那么你需要调整你的代码来跳过第一行。
注意您实际上并不需要 IE,因为您想要的内容是静态的。您可以使用 XMLHTTP。而且您正在将数据写到与您结束格式不同的工作表中。
Sub CopyFromHKAB()
Dim ie As Object, btnmore As Object, tbl As Object
Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
ThisWorkbook.Sheets("data").UsedRange.Clear
Set ie = CreateObject("internetexplorer.application")
With ie
.Visible = True
.navigate "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0"
Do
DoEvents
Loop While .readyState <> 4 Or .Busy
Set tbl = .document.getElementsByClassName("etxtmed")(3)
End With
'get data from table
r = tbl.Rows.Length - 1
c = tbl.Rows(1).Cells.Length - 1
ReDim arr(0 To r, 0 To c)
Set rr = tbl.Rows
For i = 1 To r
Set cc = rr(i).Cells
For j = 0 To c
arr(i - 1, j) = cc(j).innertext
Next
Next
ie.Quit
'Application.ScreenUpdating = False
ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(r + 1, c + 1) = arr
With ThisWorkbook.Worksheets("data")
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End Sub
我会考虑切换到 XHR 以避免浏览器开销,并使用 querySelectorAll
允许使用 css 选择器列表来仅定位感兴趣的节点
Option Explicit
Public Sub GetHKABInfo()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0", False
.setRequestHeader "User-Agent", "Safari/537.36"
.send
html.body.innerHTML = .responseText
End With
Dim arr() As Variant, nodes As MSHTML.IHTMLDOMChildrenCollection, i As Long
Set nodes = html.querySelectorAll(".etxtmed .etxtmed td")
ReDim arr(1 To nodes.Length - 1)
For i = LBound(arr) To UBound(arr)
arr(i) = nodes.Item(i).innertext
Next
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(UBound(arr), 1) = Application.Transpose(arr)
End Sub
【讨论】:
感谢您的回答!我把arr(i - 1, j)
改回arr(i , j)
,否则代码会报Run-time error '9': Subscript out of range error
。然后代码运行成功。原来我刮错了桌子。一开始我以为是我用的数组函数错了
奇数。不会给我一个下标超出范围的错误,但也许您使用的是不同的 Excel 版本,并且不知何故我们有不同的计数。
顺便说一句,您提到不需要为此使用 IE,您的意思是 XMLHTTP
方法吗?
是的。根据底部代码版本 :-) 我已对其进行了编辑以使其更清晰。道歉。
刚刚测试了XMLHTTP
方法。它比 IE 方法快得令人难以置信!我对网络抓取很陌生,上一个项目涉及一些按钮点击动作的模拟,所以我选择了 IE 方法。这个练习让我熟悉了 IE 方法,对于XMLHTTP
,我是全新的。另外你能解释一下最后一行代码ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(UBound(arr), 1) = Application.Transpose(arr)
。我删除了Application.Transpose
,代码只复制了第一行数据。非常感谢!以上是关于VBA如何批量抓取数据的主要内容,如果未能解决你的问题,请参考以下文章