比较两张纸并突出显示每张纸上的差异 - 循环是唯一的方法吗?
Posted
技术标签:
【中文标题】比较两张纸并突出显示每张纸上的差异 - 循环是唯一的方法吗?【英文标题】:Compare two sheets and highlight differences on each sheet - is looping the only way? 【发布时间】:2021-06-05 09:14:01 【问题描述】:我正在寻求有关提高大型数据集性能的建议(每张表上大约 175k 行和 39 列 A:AM(比较 sheet1 与 sheet2)。这些表是从 Access 导出的,我的 VBA 是用 Access 编写的. 我的编码采用“for”循环,逐个检查单元格并突出显示每个相关单元格中是否存在不匹配。
我的问题 - 使用数组或字典函数会加速这个过程吗?如果是,您能否提供面包屑以阐明如何执行?此代码目前大约需要 3 小时才能完成。从 Access 导出到 Excel 大约需要 2 分钟,其余时间代表循环和突出显示。
作为说明 - 我已经编写了条件格式的代码,并且运行速度非常快。主要问题是我无法将带有突出显示的单元格的工作表复制/粘贴到新工作表中,同时留下条件。我很想知道是否有人找到了操纵那个雷区的方法。
代码如下:
DoCmd.SetWarnings False
Dim xlapp As Excel.Application
Dim xlbook As Excel.Workbook
Dim xlSheet, xlSheetPre, xlSheetPost As Excel.Worksheet
Dim SQL As String
Dim rs1 As DAO.Recordset
Dim iSheet As Long, iRow As Long, iCol As Long, cols As Long
Dim MaxLastRow As Long, MaxLastCol As Long
Dim LastRow1 As Range, LastRow2 As Range
Dim LastCol1 As Range, LastCol2 As Range
Dim i As Integer
SQL = "SELECT * From Pre"
Set rs1 = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
Set xlapp = Excel.Application
xlapp.Visible = True
Set xlbook = xlapp.Workbooks.Add
i = 1
Do
Set xlSheet = Sheets.Add(after:=Sheets(Sheets.Count))
i = i + 1
Loop Until i = 2 ' the number 2 represents how many sheets you want to add to the
workbook
Set xlSheet = xlbook.Worksheets(1) ' Finds worksheet (1) and begins loading data from SQL
table above
With xlSheet
.Name = "Pre" ' Name the worksheet
.Range("a1:am1").Font.Bold = True 'Converts headers in row 1 to
bold font
.Range("A2").CopyFromRecordset rs1 'Copies all data from selected
table (SQL)into your worksheet
.Range("a1").AutoFilter ' Adds filter to your columns
.Cells.Columns.AutoFit ' Adjust worksheet column width to autofit
your data
.Range("a1:am1").Interior.ColorIndex = 37 ' Changes color of cell
' This loop reads all headers in your access table and places
them on worksheet
For cols = 0 To rs1.Fields.Count - 1
.Cells(1, cols + 1).Value = rs1.Fields(cols).Name
Next
End With
SQL = "SELECT * From Post"
Set rs1 = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
Set xlSheet = xlbook.Worksheets(2)
With xlSheet
.Name = "Post" ' Name the worksheet
.Range("a1:am1").Font.Bold = True 'Converts headers in row 1 to
bold font
.Range("A2").CopyFromRecordset rs1 'Copies all data from selected
table (SQL)into your worksheet
.Range("a1").AutoFilter ' Adds filter to your columns
.Cells.Columns.AutoFit ' Adjust worksheet column width to autofit
your data
.Range("a1:am1").Interior.ColorIndex = 37 ' Changes color of cell
' This loop reads all headers in your access table and places
them on worksheet
' This loop reads all headers in your access table and places them on worksheet
For cols = 0 To rs1.Fields.Count - 1
.Cells(1, cols + 1).Value = rs1.Fields(cols).Name
Next
End With
Set xlSheetPre = xlbook.Worksheets(1)
Set xlSheetPost = xlbook.Worksheets(2)
Set LastRow1 = xlSheetPre.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious)
Set LastRow2 = xlSheetPost.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious)
If Not LastRow1 Is Nothing Then
If Not LastRow2 Is Nothing Then
If LastRow1.Row > LastRow2.Row Then
MaxLastRow = LastRow1.Row
Else
MaxLastRow = LastRow2.Row
End If
Else
MaxLastRow = LastRow1.Row
End If
Else
MaxLastRow = LastRow2.Row
End If
Set LastCol1 = xlSheetPre.Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious)
Set LastCol2 = xlSheetPost.Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious)
If Not LastCol1 Is Nothing Then
If Not LastCol2 Is Nothing Then
If LastCol1.Column > LastCol2.Column Then
MaxLastCol = LastCol1.Column
Else
MaxLastCol = LastCol2.Column
End If
Else
MaxLastCol = LastCol1.Column
End If
Else
MaxLastCol = LastCol2.Column
End If
For iRow = 2 To MaxLastRow 'starting loop on row 2
For iCol = 4 To MaxLastCol 'starting loop on column 4
If xlSheetPre.Cells(iRow, iCol).Value <> xlSheetPost.Cells(iRow, iCol).Value Then
xlSheetPre.Cells(iRow, iCol).Interior.ColorIndex = 4
xlSheetPost.Cells(iRow, iCol).Interior.ColorIndex = 4
End If
Next iCol
Next iRow
SubExit:
On Error Resume Next
rs1.Close
Set rs1 = Nothing
DoCmd.SetWarnings True
Exit Sub
【问题讨论】:
您的代码只将标题放在 Excel 工作表中?数据在哪里?如果您将数据加载到数组中并比较数组,您的代码将运行得更快:逐个单元格的数据访问相对较慢。 也不清楚LastRow1
等可能是Nothing
- 您已经将标题写入工作表,所以这永远不会发生。
感谢您指出这一点。我遗漏了我的格式行,看起来我最初删除了太多。对于 LastRow1/2,我的想法是说如果 LastRow1 与 LastRow2 不同,请调整具有较少的工作表,以便工作表之间的所有比较都是 1 比 1。
pre 和 post 表是否共享一个公共主键?
是的,A 列在导出到 Excel 之前在 Access 中具有对齐的键(a1 on pre = a1 on post)。如果 Pre 有一个 post 没有的键,则 Pre 键将突出显示。
【参考方案1】:
尝试通过仅提取有差异的记录来减少必须比较的记录数量。在 SQL 中有几种方法可以做到这一点,但作为概念证明,这会依次比较每一列,创建一个临时键表,用于过滤提取的记录。
Option Compare Database
Option Explicit
Sub DumpToExcel()
Dim n As Integer, SQL As String, fname
' field names
fname = Array("", "F1", "F2", "F3", "F4", "F5", _
"F6", "F7", "F8", "F9", "F10")
' identify diff records
Debug.Print UBound(fname)
DoCmd.SetWarnings False
For n = 1 To UBound(fname)
If n = 1 Then ' create table
SQL = " SELECT post.ID, """ & n & """ AS Col INTO tmp"
Else
SQL = " INSERT INTO tmp" & _
" SELECT post.ID, """ & n & """ AS Col"
End If
SQL = SQL & _
" FROM Post LEFT JOIN pre ON Post.id = pre.id" & _
" WHERE NZ([pre].[" & fname(n) & "],"")<>NZ([post].[" & fname(n) & "],"");"
DoCmd.RunSQL SQL
Next
DoCmd.SetWarnings True
' extract data
Dim rs1 As DAO.Recordset
SQL = " SELECT * FROM pre" & _
" WHERE (((pre.[ID]) In " & _
" (SELECT DISTINCT(ID) FROM tmp )));"
Set rs1 = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
' create excel
Dim xlapp As Excel.Application, xlBook As Excel.Workbook
Dim xlSheet As Excel.Worksheet
Set xlapp = Excel.Application
xlapp.Visible = True
Set xlBook = xlapp.Workbooks.Add
'add sheets as required
Do While xlBook.Sheets.Count < 2
xlBook.Sheets.Add
Loop
' copy recordset to sheet
xlBook.Sheets(1).Range("A2").CopyFromRecordset rs1
MsgBox "Done"
End Sub
【讨论】:
谢谢。我快速复制了编码,但弹出框显示“输入参数值 - “Post.ID”,然后是“pre.id”。我将查看这段代码,看看我是否能理解它并让你知道的。 @Erik 将 ID 更改为您的主键字段【参考方案2】:这是一个基于数组的比较。
已编译但未测试:
Sub Tester()
Dim xlapp As Excel.Application
Dim xlbook As Excel.Workbook
Dim xlSheet, wsPre As Excel.Worksheet, wsPost As Excel.Worksheet
Dim rowsPost As Long, rowsPre As Long, rowsMax As Long
Dim colsPre As Long, colsPost As Long, colsMax As Long, flag As Boolean
Dim r As Long, c As Long, rngPre As Range, rngPost As Range, arrPre, arrPost
DoCmd.SetWarnings False
Set xlapp = New Excel.Application 'forgot "New" here?
xlapp.Visible = True
Set xlbook = xlapp.Workbooks.Add()
Do While xlbook.Worksheets.Count < 2 'how many sheets you need in the Workbook
xlbook.Sheets.Add
Loop
Set wsPre = xlbook.Worksheets(1)
Set wsPost = xlbook.Worksheets(2)
PutInWorksheet "SELECT * From Pre", wsPre, "Pre"
PutInWorksheet "SELECT * From Post", wsPost, "Post"
Set rngPre = wsPre.Range("A1").CurrentRegion 'data ranges
Set rngPost = wsPost.Range("A1").CurrentRegion
arrPre = rngPre.Value 'read data to arrays
arrPost = rngPost.Value
rowsPre = UBound(arrPre, 1) 'compare array bounds...
rowsPost = UBound(arrPost, 1)
rowsMax = xlapp.Max(rowsPre, rowsPost)
colsPre = UBound(arrPre, 2)
colsPost = UBound(arrPost, 2)
colsMax = xlapp.Max(colsPre, colsPost)
For r = 2 To rowsMax
flag = (r > rowsPre) Or (r > rowsPost) 'flag whole row if have run out of data in one set...
If flag Then
FlagRanges rngPre.Cells(r, 1).Resize(1, colsMax), _
rngPost.Cells(r, 1).Resize(1, colsMax)
Else
'have two rows to compare
For c = 1 To colsMax
flag = (c > colsPre) Or (c > colsPost) 'run out of cols in one dataset?
If Not flag Then
flag = arrPre(r, c) <> arrPost(r, c) 'compare data
End If
If flag Then
'no data to compare, or data does not match
FlagRanges rngPre.Cells(r, c), rngPost.Cells(r, c)
End If
Next c
End If
Next r
End Sub
Sub FlagRanges(rng1 As Excel.Range, rng2 As Excel.Range)
Const CLR_INDX = 4
rng1.Interior.ColorIndex = CLR_INDX
rng2.Interior.ColorIndex = CLR_INDX
End Sub
'run a query and put the results on a worksheet starting at A1
Sub PutInWorksheet(SQL As String, ws As Excel.Worksheet, _
Optional newName As String = "")
Dim f, c As Excel.Range, rs As dao.Recordset
If Len(newName) > 0 Then ws.Name = newName
Set rs = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
Set c = ws.Range("A1")
For Each f In rs.Fields
c.Value = f.Name
c.Font.Bold = True
Next f
ws.Range("A2").CopyFromRecordset rs
rs.Close
End Sub
【讨论】:
谢谢你,蒂姆。一个问题,当我编译时,我被标记为编译错误:“找不到方法或数据成员。”对于“rowsMax = Application.Max(rowsPre, rowsPost)”。这是一个工具>参考问题吗? 更具体地说,.Max 正在出错。 对不起,我是用 excel 写的,所以你需要 xlapp.Max 很好,xlApp.Max 解决了编译错误。看来我在 Sub PutInWorksheet 中又遇到了一个问题:“For each f In rs1.fields”表示需要一个对象。 对于 rs.fields 中的每个 f【参考方案3】:“我的问题 - 使用数组或字典函数会加快处理速度吗?”
根据经验,答案是:不,不会。原因是您必须首先读取工作表中的单元格以填充数组或字典,所以...循环是真的,您需要组织数据(通常通过对列表、表格、范围等)以尽量减少对匹配记录(行)的搜索,从而使您的循环运行得更快。
如果您在 Access 中,那么您可以直接使用记录集执行此操作,前提是您公司的网络安全不会干扰记录集对象的移动(我的确实会干扰,而且非常严重 - Tanium 是一个真正的威胁!)
【讨论】:
根据我的经验 - 如果您执行嵌套循环来比较记录 - 使用字典会快得多。加载字典确实有一些开销,但避免嵌套循环是非常值得的。 @TimWilliams:如何在字典中存储具有多个字段和多种数据类型的记录? 使用字典和数组有什么优势吗? @Erik_U -- 它们是不同的,这取决于您需要什么以及数据是什么;谷歌,请你自己看看。 您可以将任何数据类型存储在字典条目的 Value 中,但在这种“避免嵌套循环”场景中使用时,它通常只是指向数组中行号的行索引跨度>以上是关于比较两张纸并突出显示每张纸上的差异 - 循环是唯一的方法吗?的主要内容,如果未能解决你的问题,请参考以下文章