使用 VBA 从 Web 抓取数据时无法获取准确的元素类表

Posted

技术标签:

【中文标题】使用 VBA 从 Web 抓取数据时无法获取准确的元素类表【英文标题】:Unable to get the exact element class table when scraping data from web using VBA 【发布时间】:2022-01-20 08:55:53 【问题描述】:

我想从网站上抓取下表。 enter image description here

根据网页代码,我发现表格似乎属于元素类etxtmed,所以我在VBA下面写了。运行此代码后,我发现它只抓取以下数据 enter image description here

我以为这是因为("etxtmed")(0) 指的是第一个("etxtmed") 表然后我在(0) 之后尝试了几个数字,VBA 首先报告"Element not exist" 然后在这行代码r = tbl.Rows.Length - 1 报告错误Run-time error '91':Object variable or With block variable not set。是不是因为我刮错了表的类别?

Sub CopyRateFromHKAB()

    Dim ie As Object, btnmore As Object, tbl As Object
    Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
    
    ThisWorkbook.Sheets("Sheet2").UsedRange.Clear
    
    Set ie = CreateObject("internetexplorer.application")
    With ie
        '.Visible = True
        .navigate "https://www.hkab.org.hk/DisplayInterestSettlementRatesAction.do?lang=en"
        
        Do
            DoEvents
        Loop While .readyState <> 4 Or .Busy
          
    
        Set tbl = .document.getElementsByClassName("etxtmed")(0)
        
        If tbl Is Nothing Then
            MsgBox "Element not exist"
        End If
            
    End With
    
    'get data from table
    r = tbl.Rows.Length - 1
    c = tbl.Rows(0).Cells.Length - 1
    
    ReDim arr(0 To r, 0 To c)
    
    Set rr = tbl.Rows
    For i = 0 To r
        Set cc = rr(i).Cells
        For j = 0 To c
            arr(i, j) = cc(j).innertext
        Next
    
    Next
    
    ie.Quit
  
    Application.ScreenUpdating = False
    ThisWorkbook.Sheets("Sheet2").Cells(1, 1).Resize(r + 1, c + 1) = arr
    
    With ThisWorkbook.Sheets("Sheet2")
        .UsedRange.WrapText = False
        .Columns.AutoFit
    End With
    
End Sub

【问题讨论】:

【参考方案1】:

您想要的表格在 IFRAME 内,因此您需要直接访问该页面&lt;iframe src="/hibor/listRates.do?lang=en&amp;Submit=Detail"

Option Explicit

Sub CopyRateFromHKAB()
    
    Const URL = "https://www.hkab.org.hk/hibor/listRates.do?lang=en&amp;Submit=Detail"
    Dim htmlDoc As Object, request As Object
    
    ' get web page
    Set HTMLDoc = CreateObject("HTMLfile")
    Set request = CreateObject("MSXML2.XMLHTTP")
    With request
        .Open "GET", URL, False
        .send
        HTMLDoc.body.innerHTML = .responseText
    End With
    
    ' parse html table
    Dim wb As Workbook, r As Long, c As Long, arr
    Dim tbl As Object, t As Object, tr As Object, td As Object
    
    Set wb = ThisWorkbook
    Set tbl = HTMLDoc.getElementsByClassName("etxtmed")
    
    If tbl Is Nothing Then
        MsgBox "No tables found", vbExclamation
        Exit Sub
    Else
        If tbl(2) Is Nothing Then
            MsgBox "Table not found", vbExclamation
            Exit Sub
        Else
            r = tbl(2).Rows.Length
            ReDim arr(1 To r, 1 To 3)
            r = 1
            For Each tr In tbl(2).Rows
               c = 1
               For Each td In tr.Cells
                   arr(r, c) = td.innerText
                   c = c + 1
               Next
               r = r + 1
            Next
        End If
                 
        'copy to sheet
        With wb.Sheets("Sheet2")
            .Cells(1, 1).Resize(UBound(arr), UBound(arr, 2)) = arr
            .UsedRange.WrapText = False
            .Columns.AutoFit
        End With
          
    End If
    MsgBox "Done", vbInformation
End Sub

【讨论】:

感谢您的帮助!我昨晚测试了代码,它运行成功。该网站目前似乎不可用,稍后我将对其进行测试。由于我经常使用 IE 方法,而 XMLHTTP 方法对我来说还是很新的,根据我目前所学到的知识,您尝试获取名为 etxtmed 的所有元素类。 tbl(2) 表示第三个etxtmed 表。然后您遍历表的所有行和列并将其存储在动态数组中。对于arr(1 To r, 1 To 3),这是否意味着您将数组的列限制为 2?可以改成arr(0 To r, 0 To c)吗? @SunGuochen 该页面有嵌套的表和类 etxtmed 所以我遍历它们并发现 tbl(2) 是你想要的。除了表格在第一行和最后一行合并了单元格之外,您可以拥有动态数组大小,因此确定最大大小并不容易。

以上是关于使用 VBA 从 Web 抓取数据时无法获取准确的元素类表的主要内容,如果未能解决你的问题,请参考以下文章

Web在VBA中仅抓取网页的特定部分

如何在vba中抓取web数据

MS Access VBA从Web浏览器控件的内容中获取数据

使用 css 选择器 excel vba 从网站抓取数据

使用VBA宏遍历javascrape网页上的每个表

Python爬虫编程思想(156):使用Scrapy抓取天气预报数据