使用 VBA 从 Web 抓取数据时无法获取准确的元素类表
Posted
技术标签:
【中文标题】使用 VBA 从 Web 抓取数据时无法获取准确的元素类表【英文标题】:Unable to get the exact element class table when scraping data from web using VBA 【发布时间】:2022-01-20 08:55:53 【问题描述】:我想从网站上抓取下表。 enter image description here
根据网页代码,我发现表格似乎属于元素类etxtmed
,所以我在VBA下面写了。运行此代码后,我发现它只抓取以下数据
enter image description here
我以为这是因为("etxtmed")(0)
指的是第一个("etxtmed")
表然后我在(0)
之后尝试了几个数字,VBA 首先报告"Element not exist"
然后在这行代码r = tbl.Rows.Length - 1
报告错误Run-time error '91':Object variable or With block variable not set
。是不是因为我刮错了表的类别?
Sub CopyRateFromHKAB()
Dim ie As Object, btnmore As Object, tbl As Object
Dim rr As Object, cc As Object, r As Integer, c As Integer, i As Integer, j As Integer
ThisWorkbook.Sheets("Sheet2").UsedRange.Clear
Set ie = CreateObject("internetexplorer.application")
With ie
'.Visible = True
.navigate "https://www.hkab.org.hk/DisplayInterestSettlementRatesAction.do?lang=en"
Do
DoEvents
Loop While .readyState <> 4 Or .Busy
Set tbl = .document.getElementsByClassName("etxtmed")(0)
If tbl Is Nothing Then
MsgBox "Element not exist"
End If
End With
'get data from table
r = tbl.Rows.Length - 1
c = tbl.Rows(0).Cells.Length - 1
ReDim arr(0 To r, 0 To c)
Set rr = tbl.Rows
For i = 0 To r
Set cc = rr(i).Cells
For j = 0 To c
arr(i, j) = cc(j).innertext
Next
Next
ie.Quit
Application.ScreenUpdating = False
ThisWorkbook.Sheets("Sheet2").Cells(1, 1).Resize(r + 1, c + 1) = arr
With ThisWorkbook.Sheets("Sheet2")
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End Sub
【问题讨论】:
【参考方案1】:您想要的表格在 IFRAME 内,因此您需要直接访问该页面<iframe src="/hibor/listRates.do?lang=en&Submit=Detail"
Option Explicit
Sub CopyRateFromHKAB()
Const URL = "https://www.hkab.org.hk/hibor/listRates.do?lang=en&Submit=Detail"
Dim htmlDoc As Object, request As Object
' get web page
Set HTMLDoc = CreateObject("HTMLfile")
Set request = CreateObject("MSXML2.XMLHTTP")
With request
.Open "GET", URL, False
.send
HTMLDoc.body.innerHTML = .responseText
End With
' parse html table
Dim wb As Workbook, r As Long, c As Long, arr
Dim tbl As Object, t As Object, tr As Object, td As Object
Set wb = ThisWorkbook
Set tbl = HTMLDoc.getElementsByClassName("etxtmed")
If tbl Is Nothing Then
MsgBox "No tables found", vbExclamation
Exit Sub
Else
If tbl(2) Is Nothing Then
MsgBox "Table not found", vbExclamation
Exit Sub
Else
r = tbl(2).Rows.Length
ReDim arr(1 To r, 1 To 3)
r = 1
For Each tr In tbl(2).Rows
c = 1
For Each td In tr.Cells
arr(r, c) = td.innerText
c = c + 1
Next
r = r + 1
Next
End If
'copy to sheet
With wb.Sheets("Sheet2")
.Cells(1, 1).Resize(UBound(arr), UBound(arr, 2)) = arr
.UsedRange.WrapText = False
.Columns.AutoFit
End With
End If
MsgBox "Done", vbInformation
End Sub
【讨论】:
感谢您的帮助!我昨晚测试了代码,它运行成功。该网站目前似乎不可用,稍后我将对其进行测试。由于我经常使用 IE 方法,而 XMLHTTP 方法对我来说还是很新的,根据我目前所学到的知识,您尝试获取名为etxtmed
的所有元素类。 tbl(2)
表示第三个etxtmed
表。然后您遍历表的所有行和列并将其存储在动态数组中。对于arr(1 To r, 1 To 3)
,这是否意味着您将数组的列限制为 2?可以改成arr(0 To r, 0 To c)
吗?
@SunGuochen 该页面有嵌套的表和类 etxtmed 所以我遍历它们并发现 tbl(2) 是你想要的。除了表格在第一行和最后一行合并了单元格之外,您可以拥有动态数组大小,因此确定最大大小并不容易。以上是关于使用 VBA 从 Web 抓取数据时无法获取准确的元素类表的主要内容,如果未能解决你的问题,请参考以下文章