Excel VBA Redshift 查询性能改进
Posted
技术标签:
【中文标题】Excel VBA Redshift 查询性能改进【英文标题】:Excel VBA Redshift Query Performance Improvements 【发布时间】:2021-05-22 11:58:06 【问题描述】:我有一个启用了 excel 宏的工作簿,它为用户提供了输入一些参数以在查询中用作过滤 (WHERE) 子句的选项。这又被提供给查询。我有大约 3 个不使用过滤器的查询和 4 个或 5 个查询,具体取决于选择使用过滤器运行的过滤器。查询复杂度各不相同。
查询是针对 Redshift 集群运行的。 (所有数据都是保密的,RS只是内部连接,所以我不能给出整个查询或任何东西,只是例子)
3 个小查询是 1-2 行。 其余5条中的3条或4条约为40行 5th 大约是 100。
直接在没有过滤器的集群上运行时:返回 ~42400 行和 23 列
3 个小查询在不到 3 秒左右的时间内运行并加载到 excel 文件中
中等查询 1:在集群上 - ~1 秒 中等查询 2:在集群上 ~5 秒 中等查询 3:在集群上 - ~9 秒 大型查询 1:在集群上 - ~24 秒现在问题出在这里,当我在 vba 中运行这些查询时,对每个查询使用以下命令来更新列表对象(示例代码)需要 980.59(~16.4 分钟)秒
CS = "ODBC;Driver=Amazon Redshift (x64);SERVER=RS1.us-east-1.redshift.amazonaws.com;PORT=8192;DATABASE=db1;UID=user;PASSWORD=fakepasswrod;sslmode=require"
With Sheet2.ListObjects.Add(SourceType:=0, Source:=CS, Destination:=Sheet2.Range("$A$1")).QueryTable
.CommandText = Sql
.RefreshStyle = xlInsertDeleteCells
.AdjustColumnWidth = True
.ListObject.DisplayName = "Name_of_LO_1"
.Refresh BackgroundQuery:=False
End With
此外,我必须让用户能够执行通配符、逗号分隔列表和过滤器的单个条目。从单元格值构建该部分不需要很长时间。
我必须使用类似于以下的大型 if 语句构建过滤器
'Filter Fields
C_List = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D1").Value)
S_List = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D2").Value)
F_List = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D3").Value)
s_year = ThisWorkbook.Sheets(Sheet1.Name).Range("D4").Value
Scen = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D5").Value & "'"
prior_s_year_1 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D6").Value & "'"
prior_Scen_1 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D7").Value & "'"
prior_s_year_2 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D8").Value & "'"
prior_Scen_2 = "'" & ThisWorkbook.Sheets(Sheet1.Name).Range("D9").Value & "'"
cat = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D10").Value)
subcat = UCase(ThisWorkbook.Sheets(Sheet1.Name).Range("D11").Value)
If Site_List = "" And Cluster_List = "" And FBN_List = "" Then
response = MsgBox("You have chosen no Site, Cluster or FBN filters, this will pull all data and may take some time" & vbNewLine & "Do you wish to continue?", vbYesNo)
If response = vbNo Then
Call MsgBox("Exiting data retrieval, please enter Site, Cluster or FBN filters and restart", vbOKOnly)
Call DeleteConnections
Exit Sub
End If
ElseIf C_List = "ALL" Then
UserDefinedFilters = " bd.reg IN ( SELECT DISTINCT c FROM att_1 ) "
ElseIf S_List <> "" And C_List <> "" And F_List <> "" Then
S_List = Replace(S_List, ", ", ",")
C_List = Replace(C_List, ", ", ",")
F_List = Replace(F_List, ", ", ",")
UserDefinedFilters = UserDefinedFilters & " UPPER(s) in ('" & Replace(S_List, ",", "','") & "')" & _
vbNewLine & " AND UPPER(reg) in ('" & Replace(C_List, ",", "','") & "')" & _
vbNewLine & " AND UPPER(f) in ('" & Replace(F_List, ",", "','") & "')"
ElseIf S_List <> "" And C_List <> "" And F_List = "" Then
S_List = Replace(S_List, ", ", ",")
Cluster_List = Replace(C_List, ", ", ",")
UserDefinedFilters = UserDefinedFilters & " UPPER(s) in ('" & Replace(S_List, ",", "','") & "')" & _
vbNewLine & " AND UPPER(reg) in ('" & Replace(C_List, ",", "','") & "')"
ElseIf S_List <> "" And C_List = "" And F_List = "" Then
S_List = Replace(S_List, ", ", ",")
UserDefinedFilters = UserDefinedFilters & " UPPER(s) in ('" & Replace(S_List, ",", "','") & "')"
ElseIf S_List = "" And C_List <> "" And F_List = "" Then
C_List = Replace(C_List, ", ", ",")
UserDefinedFilters = UserDefinedFilters & " UPPER(reg) in ('" & Replace(C_List, ",", "','") & "')"
ElseIf S_List = "" And C_List = "" And F_List <> "" Then
If InStr(1, F_List, ",") > 0 Then
F_List = Replace(F_List, ", ", ",")
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & Replace(UCase(F_List), ",", "','") & "')"
ElseIf InStr(1, F_List, "*") > 0 Then
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) LIKE '%" & Replace(UCase(F_List), "*", "") & "%'"
ElseIf InStr(1, F_List, "ABC") > 0 Then
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) LIKE '%" & UCase(Left(F_List, 12)) & "%'"
Else
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & UCase(F_List) & "')"
End If
ElseIf S_List = "" And C_List <> "" And F_List <> "" Then
If InStr(1, F_List, ",") > 0 Then
F_List = Replace(F_List, ", ", ",")
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & Replace(UCase(F_List), ",", "','") & "')"
ElseIf InStr(1, F_List, "*") > 0 Then
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) LIKE '%" & Replace(UCase(F_List), "*", "") & "%'"
Else
UserDefinedFilters = UserDefinedFilters & " UPPER(bd.f) in ('" & UCase(F_List) & "')"
End If
End If
'Cat and SubCat Filters
If cat <> "" And subcat <> "" Then
cat = Replace(cat, ",", "','")
subcat = Replace(subcat, ",", "','")
BCSFilters = BCSFilters & " AND UPPER(sca.cat) IN ('" & cat & "')" & _
vbNewLine & "AND UPPER(sca.subcat) in ('" & subcat & "')"
ElseIf cat <> "" And subcat = "" Then
cat = Replace(cat, ",", "','")
BCSFilters = BCSFilters & " AND UPPER(sca.cat) IN ('" & cat & "')"
ElseIf cat = "" And subcat <> "" Then
subcat = Replace(subcat, ",", "','")
BCSFilters = BCSFilters & " AND UPPER(sca.subcat) IN ('" & subcat & "')"
End If
以上只是两组,但它应该让您了解我必须为构建 where 子句做什么。
我找不到使用 ADODB 让记录集工作的方法,我不确定这是否会更快。如果可能的话,我需要做这个 DSNless,因为该文件被广泛的用户使用。任何人都可以想到的任何事情都可能有助于减少查询中的大量时间?
编辑:
添加我为记录集尝试的代码:
Dim conn As Object
Dim rs As Object
Set conn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
CS = "Driver=Amazon Redshift (x64);DATA SOURCE=RS1.us-east-1.redshift.amazonaws.com;PORT=8192;DATABASE=db1;UID=user;PWD=fakepasswrod;sslmode=require"
conn.Open CS
Set RegAtt = ThisWorkbook.Sheets(Sheet6.Name)
RegAtt.Cells.Clear
RegSql = "SELECT cl,reg,curr FROM schema.table1"
rs.Open RegSql
With RegAtt.ListObjects.Add(xlSrcQuery, rs, Destination:=RegAtt.Range("$A$1")).QueryTable
'.CommandText = RegSql
.RefreshStyle = xlInsertDeleteCells
.AdjustColumnWidth = True
.ListObject.DisplayName = "LO_2"
.Refresh BackgroundQuery:=False
End With
那个连接字符串我得到一个找不到驱动程序的错误。
这个CS = "Driver=Amazon Redshift (x64);SERVER=RS1.us-east-1.redshift.amazonaws.com;PORT=8192;DATABASE=db1;ID=user;PASSWORD=fakepasswrod;sslmode=require"
我得到 3709 - 连接不能用于执行此操作。在这种情况下,它要么是关闭的,要么是无效的。
【问题讨论】:
仅供参考ThisWorkbook.Sheets(Sheet1.Name)
与 Sheet1
相同
I cannot find a way to get recordsets working using ADODB
如果这是主要问题,那么您尝试了什么以及遇到了什么错误?您的“集群上”时间是最后一条记录的时间,还是只是返回初始结果的时间?这两次可能完全不同......
@TimWilliams 我更新了记录集尝试。
您是在 32 位还是 64 位的 Excel 中运行的?驱动的位数是否与 Excel 匹配?
64 是的,它匹配。
【参考方案1】:
这不会改变性能,但您可能会发现采用更加面向对象的方法来构建查询会带来好处。例如,如果你定义一个类模块来保存参数和逻辑,那么构建脚本就会变成这样;
Sub BuildFilters()
Dim wb As Workbook, ws As Worksheet, response
Set wb = ThisWorkbook
Set ws = wb.Sheets(1)
Const msg1 = "You have chosen no Site, Cluster or FBN filters," & _
"this will pull all data and may take some time" & vbNewLine & _
"Do you wish to continue?"
Const msg2 = "Exiting data retrieval, please enter Site, Cluster or FBN filters and restart"
Dim Qb As New QueryBuilder
Qb.Init ws ' get parameters
If Qb.hasNone Then
response = MsgBox(msg1, vbYesNo)
If response = vbNo Then
Call MsgBox(msg2, vbOKOnly)
'Call DeleteConnections
End If
Else
' build SQL
Qb.BuildUDFilter
Qb.BuildBCSFilters
' dump to sheet to check result
ws.Range("D13") = Qb.UDFilter
ws.Range("D15") = Qb.BCSFilters
End If
End Sub
类模块查询构建器
Option Explicit
Public BCSFilters As String
Public UDFilter As String
Dim C_List As String, hasC As Boolean
Dim S_List As String, hasS As Boolean
Dim F_List As String, hasF As Boolean
Dim s_year As String
Dim Scen As String
Dim prior_s_year_1 As String
Dim prior_Scen_1 As String
Dim prior_s_year_2 As String
Dim prior_Scen_2 As String
Dim cat As String, hasCat As Boolean
Dim subcat As String, hasSubcat As Boolean
Dim count As Integer, hasAny As Boolean
' Initialise Object from Sheet
Sub Init(ws As Worksheet)
With ws
C_List = .Cells(1, 4) ' D1
S_List = .Cells(2, 4)
F_List = .Cells(3, 4)
s_year = Cells(4, 4)
Scen = quoted(.Cells(5, 4))
prior_s_year_1 = quoted(.Cells(6, 4))
prior_Scen_1 = quoted(.Cells(7, 4))
prior_s_year_2 = quoted(.Cells(8, 4))
prior_Scen_2 = quoted(.Cells(9, 4))
cat = .Cells(10, 4)
subcat = .Cells(11, 4)
End With
hasC = CBool(Len(C_List))
hasS = CBool(Len(S_List))
hasF = CBool(Len(F_List))
hasCat = CBool(Len(cat))
hasSubcat = CBool(Len(subcat))
End Sub
Function hasNone() As Boolean
hasNone = Not (hasC Or hasS Or hasF)
End Function
Sub BuildUDFilter()
Dim sql As String
count = 0
If UCase(C_List) = "ALL" Then
sql = " bd.reg IN ( SELECT DISTINCT c FROM att_1 )"
Else
If hasC Then sql = BuildSelect("reg", C_List)
If hasS Then sql = sql & BuildSelect("s", S_List)
If hasF Then sql = sql & BuildSelect("f", F_List)
End If
UDFilter = sql
End Sub
Sub BuildBCSFilters()
Dim sql As String
count = 0
If hasCat Then sql = BuildSelect("sca.cat", cat)
If hasSubcat Then sql = sql & BuildSelect("sca.subcat", subcat)
BCSFilters = sql
End Sub
Private Function BuildSelect(v As String, s As String)
Dim ar As Variant, i As Integer, sql As String
s = UCase(s)
If CBool(InStr(s, "*")) Then
s = Replace(s, "*", "")
sql = " LIKE '%" & s & "%'"
ElseIf CBool(InStr(1, s, "ABC")) Then
s = Left(s, 12)
sql = " LIKE '%" & s & "%'"
Else
ar = Split(s, ",")
For i = 0 To UBound(ar)
ar(i) = Trim(ar(i))
Next
If UBound(ar) = 0 Then
sql = " = '" & ar(0) & "'"
Else
sql = " IN ('" & Join(ar, "','") & "')"
End If
End If
sql = " UPPER(" & v & ")" & sql
If count > 0 Then
sql = vbNewLine & "AND" & sql
End If
count = count + 1
BuildSelect = sql
End Function
Private Function quoted(s) As String
quoted = "'" & s & "'"
End Function
【讨论】:
【参考方案2】:可能是.AdjustColumnWidth = True
行导致性能下降? (因为它必须加载数据以确定自动宽度)。
您是否考虑过将Application.ScreenUpdating
设置为False
和Application.Calculation
设置为xlCalculationManual
来执行大部分代码?
详情见https://www.microsoft.com/en-us/microsoft-365/blog/2009/03/12/excel-vba-performance-coding-best-practices/
这可能值得一试,看看它是否能提高性能。如果是这样:
您可以在屏幕更新被禁用期间将一些适当的用户显示消息放在适当的位置。 良好的做法是存储并恢复ScreenUpdating
和 Calculation
的值,这样环境就会保持在子例程开始时的状态
【讨论】:
以上是关于Excel VBA Redshift 查询性能改进的主要内容,如果未能解决你的问题,请参考以下文章
EXCEL VBA combobox 模糊查询触发后 退格键功能改变