VBA - XMLHTTP 和 WinHttp 请求速度

Posted

技术标签:

【中文标题】VBA - XMLHTTP 和 WinHttp 请求速度【英文标题】:VBA - XMLHTTP and WinHttp request speed 【发布时间】:2017-05-22 06:00:45 【问题描述】:

下面是我在宏中实现的 3 个请求的声明变量。我列出了他们在 cmets 中使用的库及其后期绑定:

Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")

我有一些使用 Internet Explorer 自动化的旧网页抓取宏。我想清理编码并通过这些请求加快它们的速度。

不幸的是,我注意到,MSXML2.ServerXMLHTTPWinHttpRequest 在网上商店的 20 种产品测试(34 和 35 秒)上比带有图片和活动脚本关闭的 IE 自动化(24 秒)慢! MSXML2.XMLHTTP 在 18 秒内执行。我曾经看到这 3 个请求中的一些请求比其他请求快/慢 2-3 倍的情况,所以我总是测试哪一个执行得最好,但以前从未有任何请求丢失到 IE 自动化。

有结果的主页在下面,一页上的所有结果,1500+,所以请求需要一些时间(如果粘贴到 MS Word,6500 页):

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

然后我从主结果页面打开单个链接:

http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/

我想知道这 3 个请求是否都是我必须在没有浏览器自动化的情况下从网站获取数据的选项。另外 - 浏览器自动化怎么可能击败其中一些请求?

更新

我已经使用 Robin Mackenzie 在回答中提供的程序测试了主要结果页面,在运行之前清除了 IE 缓存。至少在这个特定页面上,缓存似乎没有明显的好处,因为随后的请求产生了类似的结果。 IE 禁用了活动脚本,并且没有加载图像。

IE automation method, Document length: 7593346 chars, Processed in: 8 seconds

WinHTTP method,  Document length: 7824059 chars, Processed in: 29 seconds

XML HTTP method, Document length: 7830217 chars, Processed in: 4 seconds

Server XML HTTP method, Document length: 7823958 chars, Processed in: 26 seconds

URL download file method, Document length: 7830346 chars, Processed in: 7 seconds

让我非常惊讶的是这些方法返回的字符数量的差异。

【问题讨论】:

您不能使用相同的链接来比较基准,除非您清除它们之间的 Internet 缓存 - 否则您将网络检索所需的时间与 本地缓存检索。 所有这些请求都使用缓存吗?是否有任何易于清除缓存或防止缓存的方法?我现在看到几乎所有时间我都会比较缓存链接和缓存链接。 您始终可以手动清除缓存(删除临时 Internet 文件)并单独运行基准测试。 IIR,MSXML2.XMLHTTP 具有可设置的缓存使用行为,但很难强制它始终使用新下载。我不记得有一种方法可以从 IE 中清空缓存。我很想知道您是否在没有缓存的情况下获得不同的基准。 赞成,因为我们绝对需要关于这个主题的规范问答。 【参考方案1】:

大部分时间都花在等待服务器的响应上。因此,如果您想提高执行时间,请并行发送请求。

我也会使用“Msxml2.ServerXMLHTTP.6.0”对象/接口,因为它没有实现任何缓存。

这是一个工作示例:

Sub TestRequests()
  GetUrls _
    "http://***.com/questions/34880012", _
    "http://***.com/questions/34880013", _
    "http://***.com/questions/34880014", _
    "http://***.com/questions/34880015", _
    "http://***.com/questions/34880016", _
    "http://***.com/questions/34880017"

End Sub

Private Sub OnRequest(url, xhr)
  xhr.Open "GET", url, True
  xhr.setRequestHeader "Content-Type", "text/html; charset=UTF-8"
  xhr.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
  xhr.Send
End Sub

Private Sub OnResponse(url, xhr)
  Debug.Print url, Len(xhr.ResponseText)
End Sub

Public Function GetUrls(ParamArray urls())
    Const WORKERS = 10

    ' create http workers
    Dim wkrs(0 To WORKERS * 2 - 1), i As Integer
    For i = 0 To UBound(wkrs) Step 2
      Set wkrs(i) = CreateObject("Msxml2.ServerXMLHTTP.6.0")
    Next

    ' send the requests in parallele
    Dim index As Integer, count As Integer, xhr As Object
    While count <= UBound(urls)
      For i = 0 To UBound(wkrs) Step 2
        Set xhr = wkrs(i)

        If xhr.readyState And 3 Then  ' if busy
          xhr.waitForResponse 0.01    ' wait 10ms
        ElseIf Not VBA.IsEmpty(wkrs(i + 1)) And xhr.readyState = 4 Then
          OnResponse urls(wkrs(i + 1)), xhr
          count = count + 1
          wkrs(i + 1) = Empty
        End If

        If VBA.IsEmpty(wkrs(i + 1)) And index <= UBound(urls) Then
          wkrs(i + 1) = index
          OnRequest urls(index), xhr
          index = index + 1
        End If
      Next
    Wend
End Function

【讨论】:

【参考方案2】:

除了你提到的方法之外:

IE 自动化 WinHTTPRequest XMLHTTP ServerXMLHTTP

您可以考虑其他 2 种方法:

使用MSHTML.HTMLDocument对象的CreateDocumentFromUrl方法 使用 Windows API 函数URLDownloadToFileA

我忽略了其他一些 Windows API,例如 InternetOpenInternetOpenUrl 等,因为猜测响应长度、缓冲响应等的复杂性会超过潜在的性能。

CreateDocumentFromUrl

使用CreateDocumentFromUrl 方法时,您的示例网站会出现问题,因为它会尝试在不允许出现以下错误的框架中创建HTMLDocument

禁止装帧

为帮助保护您输入本网站的信息的安全性,此内容的发布者不允许将其显示在框架中。

所以我们不应该使用这种方法。

URLDownloadToFileA

我认为您需要 php 等效于 file_get_contents 并找到了这个方法。它很容易使用(检查this link),并且在用于大请求时优于其他方法(例如,当您使用 >2000 个棒球棒时尝试它)。 XMLHTTP 也使用了 URLMon 库,所以我想这种方式只是减少了一些中间人逻辑,显然有一个缺点,因为你必须做一些文件系统处理。

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    DownloadFile strUrl, strTempFileName
    Set objFso = New FileSystemObject
    With objFso.OpenTextFile(strTempFileName, ForReading)
        strResponse = .ReadAll
        .Close
    End With
    objFso.DeleteFile strTempFileName
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
  Dim lngRetVal As Long
  lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
  If lngRetVal = 0 Then DownloadFile = True
End Function

使用 URLDownloadToFileA 下载示例 URL 大约需要 1-2 秒,而使用 XMLHTTP 方法需要 4-5 秒(完整代码如下)。

网址:

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

这是输出:

Testing...


XML HTTP method
Document length: 7869753 chars
Processed in: 4 seconds


URL download file method
Document length: 7869753 chars
Processed in: 1 seconds

代码

这包括所有讨论的方法,例如IE自动化、WinHTTPRequest、XMLHTTP、ServerXMLHTTP、CreateDocumentFromURL和URLDownloadFile。

您需要项目中的所有这些引用:

这里是:

Option Explicit

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub Test()

    Dim strUrl As String

    strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"

    Debug.Print "Testing..."
    Debug.Print VBA.vbNewLine

    'TestIE strUrl
    'TestWinHHTP strUrl
    TestXMLHTTP strUrl
    'TestServerXMLHTTP strUrl
    'TestCreateDocumentFromUrl strUrl
    TestUrlDownloadFile strUrl

End Sub

Sub TestIE(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objIe As InternetExplorer
    Dim objHtml As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objIe = New SHDocVw.InternetExplorer
    With objIe
        .navigate strUrl
        .Visible = False
        While .Busy Or .readyState <> READYSTATE_COMPLETE
           DoEvents
        Wend
        Set objHtml = .document
        strResponse = objHtml.DocumentElement.outerHTML
        .Quit
    End With
    dteFinish = Now

    Debug.Print "IE automation method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    If Not objIe Is Nothing Then
        objIe.Quit
    End If
    Set objIe = Nothing

End Sub

Sub TestWinHHTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objHttp As WinHttp.WinHttpRequest
    Dim objDoc As HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objHttp = New WinHttp.WinHttpRequest
    With objHttp
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        .WaitForResponse
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "WinHTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objHttp = Nothing

End Sub

Sub TestXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.XMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.XMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestServerXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.ServerXMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.ServerXMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "Server XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    If DownloadFile(strUrl, strTempFileName) Then
        Set objFso = New FileSystemObject
        With objFso.OpenTextFile(strTempFileName, ForReading)
            strResponse = .ReadAll
            .Close
        End With
        objFso.DeleteFile strTempFileName
    Else
        Debug.Print "Error downloading file from URL: " & strUrl
        GoTo ExitFunction
    End If
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
    Dim lngRetVal As Long
    lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
    If lngRetVal = 0 Then
        DownloadFile = True
    Else
        DownloadFile = False
    End If
End Function

Sub TestCreateDocumentFromUrl(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strResponse As String
    Dim objDoc1 As HTMLDocument
    Dim objDoc2 As HTMLDocument

    On Error GoTo ExitFunction

    dteStart = Now
    Set objDoc1 = New HTMLDocument
    Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
    While objDoc2.readyState <> "complete"
        DoEvents
    Wend
    strResponse = objDoc2.DocumentElement.outerHTML
    Debug.Print strResponse
    dteFinish = Now

    Debug.Print "HTML Document Create from URL method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc2 = Nothing
    Set objDoc1 = Nothing

End Sub

【讨论】:

感谢您的回答,明天我将进行测试。我从来没有使用readyState &lt;&gt; 4 循环来处理 XMLHTTP 请求,也没有使用.WaitForResponse 来处理WinHttp.WinHttpRequest 并且从来没有遇到过无法检索数据的问题,真的需要吗? readyState 值 4 表示加载完成:msdn.microsoft.com/en-us/library/ms753800(v=vs.85).aspx。 WaitForResponse 只是一个内置方法,呃..等待响应! msdn.microsoft.com/en-us/library/windows/desktop/…。所有测试代码都是同步的,因此可能不需要。不会影响性能 imo 请注意,如果您需要 Unicode 支持,请使用 URLDownloadToFileW @Robin:对我来说,HTMLDocument.createDocumentFromUrl 让我大开眼界,这使得文件可以与任何网络抓取过程分开下载。谢谢你。

以上是关于VBA - XMLHTTP 和 WinHttp 请求速度的主要内容,如果未能解决你的问题,请参考以下文章

在VBA WinHttpRequest上捕获超时

Msxml2.ServerXMLHTTP 和 WinHttp.WinHttpRequest 之间的区别?

VBA WinHTTP 从受密码保护的 https 网站下载文件

使用 vba 和 xmlhttp 自动提交网站上的帖子表单

Excel VBA中的XmlHttp Post不更新网站表单

VBA XMLHTTP 清除身份验证?