Excel VBA Redshift 查询性能改进

Posted

技术标签:

【中文标题】Excel VBA Redshift 查询性能改进【英文标题】:Excel VBA Redshift Query Performance Improvements 【发布时间】:2021-05-22 11:58:06 【问题描述】:

我有一个启用了 excel 宏的工作簿,它为用户提供了输入一些参数以在查询中用作过滤 (WHERE) 子句的选项。这又被提供给查询。我有大约 3 个不使用过滤器的查询和 4 个或 5 个查询,具体取决于选择使用过滤器运行的过滤器。查询复杂度各不相同。

查询是针对 Redshift 集群运行的。 (所有数据都是保密的,RS只是内部连接,所以我不能给出整个查询或任何东西,只是例子)

3 个小查询是 1-2 行。 其余5条中的3条或4条约为40行 5th 大约是 100。

直接在没有过滤器的集群上运行时:返回 ~42400 行和 23 列

3 个小查询在不到 3 秒左右的时间内运行并加载到 excel 文件中

中等查询 1:在集群上 - ~1 秒 中等查询 2:在集群上 ~5 秒 中等查询 3:在集群上 - ~9 秒 大型查询 1:在集群上 - ~24 秒

现在问题出在这里,当我在 vba 中运行这些查询时,对每个查询使用以下命令来更新列表对象(示例代码)需要 980.59(~16.4 分钟)秒

CS = "ODBC;Driver=Amazon Redshift (x64);SERVER=RS1.us-east-1.redshift.amazonaws.com;PORT=8192;DATABASE=db1;UID=user;PASSWORD=fakepasswrod;sslmode=require"

With Sheet2.ListObjects.Add(SourceType:=0, Source:=CS, Destination:=Sheet2.Range("$A$1")).QueryTable
    .CommandText = Sql
    .RefreshStyle = xlInsertDeleteCells
    .AdjustColumnWidth = True
    .ListObject.DisplayName = "Name_of_LO_1"
    .Refresh BackgroundQuery:=False
End With

此外,我必须让用户能够执行通配符、逗号分隔列表和过滤器的单个条目。从单元格值构建该部分不需要很长时间。

我必须使用类似于以下的大型 if 语句构建过滤器

'Filter Fields
C_List = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D1").Value)
S_List = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D2").Value)
F_List = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D3").Value)
s_year = ThisWorkbook.Sheets(Sheet1.Name).Range("D4").Value
Scen = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D5").Value & "'"
prior_s_year_1 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D6").Value & "'"
prior_Scen_1 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D7").Value & "'"
prior_s_year_2 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D8").Value & "'"
prior_Scen_2 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D9").Value & "'"
cat = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D10").Value)
subcat = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D11").Value)


If Site_List = "" And Cluster_List = "" And FBN_List = "" Then
    response = MsgBox("You have chosen no Site, Cluster or FBN filters, this will pull all data and may take some time" & vbNewLine & "Do you wish to continue?", vbYesNo)
    If response = vbNo Then
        Call MsgBox("Exiting data retrieval, please enter Site, Cluster or FBN filters and restart", vbOKOnly)
        Call DeleteConnections
        Exit Sub
    End If

ElseIf C_List = "ALL" Then
    UserDefinedFilters = " bd.reg IN ( SELECT DISTINCT c FROM att_1 ) "
    
ElseIf S_List <> "" And C_List <> "" And F_List <> "" Then
    S_List = Replace(S_List, ", ", ",")
    C_List = Replace(C_List, ", ", ",")
    F_List = Replace(F_List, ", ", ",")
    UserDefinedFilters = UserDefinedFilters & " UPPER(s) in ('" & Replace(S_List, ",", "','") & "')" & _
    vbNewLine & " AND UPPER(reg) in ('" & Replace(C_List, ",", "','") & "')" & _
    vbNewLine & " AND UPPER(f) in ('" & Replace(F_List, ",", "','") & "')"
    
ElseIf S_List <> "" And C_List <> "" And F_List = "" Then
    S_List = Replace(S_List, ", ", ",")
    Cluster_List = Replace(C_List, ", ", ",")
    UserDefinedFilters = UserDefinedFilters & " UPPER(s) in ('" & Replace(S_List, ",", "','") & "')" & _
    vbNewLine & " AND UPPER(reg) in ('" & Replace(C_List, ",", "','") & "')"
    
ElseIf S_List <> "" And C_List = "" And F_List = "" Then
    S_List = Replace(S_List, ", ", ",")
    UserDefinedFilters = UserDefinedFilters & " UPPER(s) in ('" & Replace(S_List, ",", "','") & "')"
    
ElseIf S_List = "" And C_List <> "" And F_List = "" Then
    C_List = Replace(C_List, ", ", ",")
    UserDefinedFilters = UserDefinedFilters & " UPPER(reg) in ('" & Replace(C_List, ",", "','") & "')"
    
ElseIf S_List = "" And C_List = "" And F_List <> "" Then
    If InStr(1, F_List, ",") > 0 Then
        F_List = Replace(F_List, ", ", ",")
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & Replace(UCase(F_List), ",", "','") & "')"
    ElseIf InStr(1, F_List, "*") > 0 Then
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) LIKE '%" & Replace(UCase(F_List), "*", "") & "%'"
    ElseIf InStr(1, F_List, "ABC") > 0 Then
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) LIKE '%" & UCase(Left(F_List, 12)) & "%'"
    Else
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & UCase(F_List) & "')"
    End If
    
ElseIf S_List = "" And C_List <> "" And F_List <> "" Then
    If InStr(1, F_List, ",") > 0 Then
        F_List = Replace(F_List, ", ", ",")
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & Replace(UCase(F_List), ",", "','") & "')"
    ElseIf InStr(1, F_List, "*") > 0 Then
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) LIKE '%" & Replace(UCase(F_List), "*", "") & "%'"
    Else
        UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & UCase(F_List) & "')"
    End If
End If

'Cat and SubCat Filters
If cat <> "" And subcat <> "" Then
    cat = Replace(cat, ",", "','")
    subcat = Replace(subcat, ",", "','")
    BCSFilters = BCSFilters & " AND UPPER(sca.cat) IN ('" & cat & "')" & _
    vbNewLine & "AND UPPER(sca.subcat) in ('" & subcat & "')"
    
ElseIf cat <> "" And subcat = "" Then
    cat = Replace(cat, ",", "','")
    BCSFilters = BCSFilters & " AND UPPER(sca.cat) IN ('" & cat & "')"
   
ElseIf cat = "" And subcat <> "" Then
    subcat = Replace(subcat, ",", "','")
    BCSFilters = BCSFilters & " AND UPPER(sca.subcat) IN ('" & subcat & "')"
End If

以上只是两组,但它应该让您了解我必须为构建 where 子句做什么。

我找不到使用 ADODB 让记录集工作的方法,我不确定这是否会更快。如果可能的话,我需要做这个 DSNless,因为该文件被广泛的用户使用。任何人都可以想到的任何事情都可能有助于减少查询中的大量时间?

编辑:

添加我为记录集尝试的代码:

Dim conn As Object
Dim rs As Object

Set conn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
CS = "Driver=Amazon Redshift (x64);DATA SOURCE=RS1.us-east-1.redshift.amazonaws.com;PORT=8192;DATABASE=db1;UID=user;PWD=fakepasswrod;sslmode=require"

conn.Open CS

Set RegAtt = ThisWorkbook.Sheets(Sheet6.Name)
RegAtt.Cells.Clear
RegSql = "SELECT cl,reg,curr FROM schema.table1"

rs.Open RegSql

With RegAtt.ListObjects.Add(xlSrcQuery, rs, Destination:=RegAtt.Range("$A$1")).QueryTable
    '.CommandText = RegSql
    .RefreshStyle = xlInsertDeleteCells
    .AdjustColumnWidth = True
    .ListObject.DisplayName = "LO_2"
    .Refresh BackgroundQuery:=False
End With

那个连接字符串我得到一个找不到驱动程序的错误。

这个CS = "Driver=Amazon Redshift (x64);SERVER=RS1.us-east-1.redshift.amazonaws.com;PORT=8192;DATABASE=db1;ID=user;PASSWORD=fakepasswrod;sslmode=require"

我得到 3709 - 连接不能用于执行此操作。在这种情况下,它要么是关闭的,要么是无效的。

【问题讨论】:

仅供参考 ThisWorkbook.Sheets(Sheet1.Name)Sheet1 相同 I cannot find a way to get recordsets working using ADODB 如果这是主要问题,那么您尝试了什么以及遇到了什么错误?您的“集群上”时间是最后一条记录的时间,还是只是返回初始结果的时间?这两次可能完全不同...... @TimWilliams 我更新了记录集尝试。 您是在 32 位还是 64 位的 Excel 中运行的?驱动的位数是否与 Excel 匹配? 64 是的,它匹配。 【参考方案1】:

这不会改变性能,但您可能会发现采用更加面向对象的方法来构建查询会带来好处。例如,如果你定义一个类模块来保存参数和逻辑,那么构建脚本就会变成这样;

Sub BuildFilters()

    Dim wb As Workbook, ws As Worksheet, response
    Set wb = ThisWorkbook
    Set ws = wb.Sheets(1)
    
    Const msg1 = "You have chosen no Site, Cluster or FBN filters," & _
                 "this will pull all data and may take some time" & vbNewLine & _
                 "Do you wish to continue?"
    Const msg2 = "Exiting data retrieval, please enter Site, Cluster or FBN filters and restart"

    Dim Qb As New QueryBuilder
    Qb.Init ws ' get parameters

    If Qb.hasNone Then
        response = MsgBox(msg1, vbYesNo)
        If response = vbNo Then
            Call MsgBox(msg2, vbOKOnly)
            'Call DeleteConnections
        End If
    Else
        ' build SQL
        Qb.BuildUDFilter
        Qb.BuildBCSFilters

        ' dump to sheet to check result
        ws.Range("D13") = Qb.UDFilter
        ws.Range("D15") = Qb.BCSFilters
    End If
End Sub

类模块查询构建器

Option Explicit

Public BCSFilters As String
Public UDFilter As String

Dim C_List As String, hasC As Boolean
Dim S_List As String, hasS As Boolean
Dim F_List As String, hasF As Boolean
Dim s_year As String
Dim Scen As String
Dim prior_s_year_1 As String
Dim prior_Scen_1 As String
Dim prior_s_year_2 As String
Dim prior_Scen_2 As String
Dim cat As String, hasCat As Boolean
Dim subcat As String, hasSubcat As Boolean
Dim count As Integer, hasAny As Boolean

' Initialise Object from Sheet
Sub Init(ws As Worksheet)
    With ws
        C_List = .Cells(1, 4) ' D1
        S_List = .Cells(2, 4)
        F_List = .Cells(3, 4)
        s_year = Cells(4, 4)
        Scen = quoted(.Cells(5, 4))
        prior_s_year_1 = quoted(.Cells(6, 4))
        prior_Scen_1 = quoted(.Cells(7, 4))
        prior_s_year_2 = quoted(.Cells(8, 4))
        prior_Scen_2 = quoted(.Cells(9, 4))
        cat = .Cells(10, 4)
        subcat = .Cells(11, 4)
    End With

    hasC = CBool(Len(C_List))
    hasS = CBool(Len(S_List))
    hasF = CBool(Len(F_List))  
    hasCat = CBool(Len(cat))
    hasSubcat = CBool(Len(subcat))
End Sub

Function hasNone() As Boolean
    hasNone = Not (hasC Or hasS Or hasF)
End Function

Sub BuildUDFilter()
    Dim sql As String
    count = 0

    If UCase(C_List) = "ALL" Then
        sql = " bd.reg IN ( SELECT DISTINCT c FROM att_1 )"
    Else
        If hasC Then sql = BuildSelect("reg", C_List)
        If hasS Then sql = sql & BuildSelect("s", S_List)
        If hasF Then sql = sql & BuildSelect("f", F_List)
    End If
    UDFilter = sql
End Sub

Sub BuildBCSFilters()
    Dim sql As String
    count = 0

    If hasCat Then sql = BuildSelect("sca.cat", cat)
    If hasSubcat Then sql = sql & BuildSelect("sca.subcat", subcat)
    
    BCSFilters = sql
End Sub

Private Function BuildSelect(v As String, s As String)

    Dim ar As Variant, i As Integer, sql As String
    s = UCase(s)

    If CBool(InStr(s, "*")) Then
        s = Replace(s, "*", "")
        sql = " LIKE '%" & s & "%'"
    ElseIf CBool(InStr(1, s, "ABC")) Then
        s = Left(s, 12)
        sql = " LIKE '%" & s & "%'"
    Else
        ar = Split(s, ",")
        For i = 0 To UBound(ar)
            ar(i) = Trim(ar(i))
        Next
        If UBound(ar) = 0 Then
            sql = " = '" & ar(0) & "'"
        Else
            sql = " IN ('" & Join(ar, "','") & "')"
        End If
    End If
    
    sql = " UPPER(" & v & ")" & sql
    If count > 0 Then
        sql = vbNewLine & "AND" & sql
    End If
    count = count + 1
  
    BuildSelect = sql
End Function

Private Function quoted(s) As String
    quoted = "'" & s & "'"
End Function

【讨论】:

【参考方案2】:

可能是.AdjustColumnWidth = True 行导致性能下降? (因为它必须加载数据以确定自动宽度)。

您是否考虑过将Application.ScreenUpdating 设置为FalseApplication.Calculation 设置为xlCalculationManual 来执行大部分代码?

详情见https://www.microsoft.com/en-us/microsoft-365/blog/2009/03/12/excel-vba-performance-coding-best-practices/

这可能值得一试,看看它是否能提高性能。如果是这样:

您可以在屏幕更新被禁用期间将一些适当的用户显示消息放在适当的位置。 良好的做法是存储并恢复 ScreenUpdatingCalculation 的值,这样环境就会保持在子例程开始时的状态

【讨论】:

以上是关于Excel VBA Redshift 查询性能改进的主要内容,如果未能解决你的问题,请参考以下文章

EXCEL VBA combobox 模糊查询触发后 退格键功能改变

Excel 中动态 SQL 查询的性能问题

EXCEL VBA combobox 模糊查询触发后 退格键功能改变

Redshift - 如何识别查询中的低性能区域?

Redshift 查询性能以降低 CPU 利用率

UNION 查询 Redshift 性能不佳