VBA如何批量抓取数据

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了VBA如何批量抓取数据相关的知识,希望对你有一定的参考价值。

如题,某个名为“3月”的文件夹内有约400个excle文件,如何提取sheet名为“进料检验报表”中C3/E3/G3/C4/E4/C5这6个表格中的内容?并汇总到一个表格里面?

参考技术A Sub 按钮1_Click()
Dim url, html
n = 1
url = "http://www.zjcredit.gov.cn:8000/ListQuery.aspx"
For j = 1 To 5 '这里控制查询的页数
pd = "sectionID=02" _
& "&sortField=CreditID" _
& "&sortDirection=1" _
& "&recordTotal=3151" _
& "&pageNo=" & j _
& "&pageLength=20" _
& "&isOpen=False&isIntermediary=False" _
& "&query_AreaCode=" _
& "&query_OrganizationCode=" _
& "&query_BusinessLicense=" _
& "&query_CorporationName=" _
& "&query_LegalRepresentative=" _
& "&query_BusinessScope=" _
& "&query_PromptSymbol=D"
pd1 = "query_AreaCode=&query_OrganizationCode=&query_BusinessLicense=&query_CorporationName=&query_LegalRepresentative=&query_BusinessScope=&query_PromptSymbol=d&queryTitle=&businessLicense=&actionType=&searchType=§ionID=02&hot=&returnFunction=parent.reset_queryTitles&query2_AreaCode=0&query2_BusinessLicense=&query2_CorporationName=&query2_OrganizationCode=&query2_LegalRepresentative=&validateTextbox="
Set html = CreateObject("htmlfile")
With CreateObject("msxml2.xmlhttp.6.0")
.Open "post", url, False
.setrequestheader "Content-Type", "application/x-www-form-urlencoded"
.send (pd)
html.body.innerhtml = .responsetext
Set tr = html.all.tags("tr")
For i = 0 To tr.Length - 1
If tr(i).bgcolor = "#ffffff" Or tr(i).bgcolor = "#f3f3f3" Then
n = n + 1
Cells(n, 1) = tr(i).ChildNodes(0).innertext
Cells(n, 2) = tr(i).ChildNodes(1).innertext
End If
Next
End With
Next
End Sub
Sub 按钮2_Click()
Range("a2:b65536").ClearContents
End Sub
参考技术B sub extractdata()
dim c3value,g3value,c4value,e4value,c5value
excelfilename = Dir("E:\3月\" & "*.xls")
Do While excelfilename <> ""
s = "E:\3月\" & excelfilename
Workbooks.Open Filename:=s
Workbooks(excelfilename).Activate
activeworkbook.sheets("进料检验报表").activate
c3value=activesheet(3,"C")
g3value=activesheet(3,"G")
c4value=activesheet(4,"C")
e4value=activesheet(4,"E")
c5value=activesheet(5,"C")
workbooks(合并到的excel).activate
activesheet.cells(1,1)="这是表头"
activesheet.cells(activesheet.usedrange.rows.count+1,1)=c3value
activesheet.cells(activesheet.usedrange.rows.count+1,2)=g3value
activesheet.cells(activesheet.usedrange.rows.count+1,3)=c4value
activesheet.cells(activesheet.usedrange.rows.count+1,4)=e4value
activesheet.cells(activesheet.usedrange.rows.count+1,5)=c5value
(excelfilename).Close savechanges:=false
excelfilename = Dir '第二次读入的时候不用写参数
Loop
end sub追问

说语法错误,什么情况啊。

本回答被提问者和网友采纳

使用vba中的动态数组函数从网站抓取数据

【中文标题】使用vba中的动态数组函数从网站抓取数据【英文标题】:Scraping data from website with dynamic array function in vba 【发布时间】:2022-01-14 19:39:56 【问题描述】:

我想了解更多有关从网站抓取数据时如何应用数组函数的信息。我目前正在使用这个 vba 从网站复制数据。该代码可以抓取我想要的数据,但是在将数据复制到目标工作表时,它将所有数据复制到A1 单元格。由于这个 vba 是为我之前的项目开发的并且工作正常,我不确定哪个部分出了问题。

Sub CopyFromHKAB()
    Dim ie As Object, btnmore As Object, tbl As Object
    Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
    
    ThisWorkbook.Sheets("data").UsedRange.Clear
    
    Set ie = CreateObject("internetexplorer.application")
    With ie
        .Visible = True
        .navigate "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0"
        
        Do
            DoEvents
        Loop While .readyState <> 4 Or .Busy
          
        
        Set tbl = .document.getElementsByClassName("etxtmed")(2)
            
    End With
    
    'get data from table
    r = tbl.Rows.Length - 1
    c = tbl.Rows(0).Cells.Length - 1
    
    ReDim arr(0 To r, 0 To c)
    
    Set rr = tbl.Rows
    For i = 0 To r
        Set cc = rr(i).Cells
        For j = 0 To c
            arr(i, j) = cc(j).innertext
        Next
    
    Next
    
    ie.Quit
  
    Application.ScreenUpdating = False
    ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(r + 1, c + 1) = arr
    
    With ThisWorkbook.Sheets("data")
        .UsedRange.WrapText = False
        .Columns.AutoFit
    End With
    
End Sub

【问题讨论】:

【参考方案1】:

鉴于它们是嵌套的,因此您需要选择正确的表,因此将索引更改为 3。否则,您将选择共享父项,因此所有列表实际上都在一个子元素中,因此是您当前的输出。

那么你需要调整你的代码来跳过第一行。

注意您实际上并不需要 IE,因为您想要的内容是静态的。您可以使用 XMLHTTP。而且您正在将数据写到与您结束格式不同的工作表中。

Sub CopyFromHKAB()
    Dim ie As Object, btnmore As Object, tbl As Object
    Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
    
    ThisWorkbook.Sheets("data").UsedRange.Clear
    
    Set ie = CreateObject("internetexplorer.application")
    With ie
        .Visible = True
        .navigate "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0"
        
        Do
            DoEvents
        Loop While .readyState <> 4 Or .Busy
          
        
        Set tbl = .document.getElementsByClassName("etxtmed")(3)
                
    End With
    
    'get data from table
    r = tbl.Rows.Length - 1
    c = tbl.Rows(1).Cells.Length - 1
    
    ReDim arr(0 To r, 0 To c)

    Set rr = tbl.Rows

    For i = 1 To r
    
        Set cc = rr(i).Cells
        For j = 0 To c
            arr(i - 1, j) = cc(j).innertext
        Next
    
    Next
    
    ie.Quit
  
    'Application.ScreenUpdating = False
    ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(r + 1, c + 1) = arr

    With ThisWorkbook.Worksheets("data")
        .UsedRange.WrapText = False
        .Columns.AutoFit
    End With
    
End Sub

我会考虑切换到 XHR 以避免浏览器开销,并使用 querySelectorAll 允许使用 css 选择器列表来仅定位感兴趣的节点

Option Explicit

Public Sub GetHKABInfo()
    'tools > references > Microsoft HTML Object Library
    Dim html As MSHTML.HTMLDocument, xhr As Object
     
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    Set html = New MSHTML.HTMLDocument
     
    With xhr
        .Open "GET", "https://www.hkab.org.hk/DisplayMemberAction.do?sectionid=4&subsectionid=0", False
        .setRequestHeader "User-Agent", "Safari/537.36"
        .send
        html.body.innerHTML = .responseText
    End With
    
    Dim arr() As Variant, nodes As MSHTML.IHTMLDOMChildrenCollection, i As Long
    
    Set nodes = html.querySelectorAll(".etxtmed .etxtmed td")
    
    ReDim arr(1 To nodes.Length - 1)
    
    For i = LBound(arr) To UBound(arr)
        arr(i) = nodes.Item(i).innertext
    Next
    
    ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(UBound(arr), 1) = Application.Transpose(arr)
    
End Sub

【讨论】:

感谢您的回答!我把arr(i - 1, j)改回arr(i , j),否则代码会报Run-time error '9': Subscript out of range error。然后代码运行成功。原来我刮错了桌子。一开始我以为是我用的数组函数错了 奇数。不会给我一个下标超出范围的错误,但也许您使用的是不同的 Excel 版本,并且不知何故我们有不同的计数。 顺便说一句,您提到不需要为此使用 IE,您的意思是 XMLHTTP 方法吗? 是的。根据底部代码版本 :-) 我已对其进行了编辑以使其更清晰。道歉。 刚刚测试了XMLHTTP 方法。它比 IE 方法快得令人难以置信!我对网络抓取很陌生,上一个项目涉及一些按钮点击动作的模拟,所以我选择了 IE 方法。这个练习让我熟悉了 IE 方法,对于XMLHTTP,我是全新的。另外你能解释一下最后一行代码ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(UBound(arr), 1) = Application.Transpose(arr)。我删除了Application.Transpose,代码只复制了第一行数据。非常感谢!

以上是关于VBA如何批量抓取数据的主要内容,如果未能解决你的问题,请参考以下文章

如何在vba中抓取web数据

使用 VBA 从 Web 抓取数据时无法获取准确的元素类表

使用 css 选择器 excel vba 从网站抓取数据

使用 VBA 抓取 Youtube 数据:拉取视图数据的问题,462 错误

用于在线抓取数据的 VBA 程序使我的笔记本电脑性能变慢

Python批量抓取商品数据