比较两张纸并突出显示每张纸上的差异 - 循环是唯一的方法吗?

Posted

技术标签:

【中文标题】比较两张纸并突出显示每张纸上的差异 - 循环是唯一的方法吗?【英文标题】:Compare two sheets and highlight differences on each sheet - is looping the only way? 【发布时间】:2021-06-05 09:14:01 【问题描述】:

我正在寻求有关提高大型数据集性能的建议(每张表上大约 175k 行和 39 列 A:AM(比较 sheet1 与 sheet2)。这些表是从 Access 导出的,我的 VBA 是用 Access 编写的. 我的编码采用“for”循环,逐个检查单元格并突出显示每个相关单元格中是否存在不匹配。

我的问题 - 使用数组或字典函数会加速这个过程吗?如果是,您能否提供面包屑以阐明如何执行?此代码目前大约需要 3 小时才能完成。从 Access 导出到 Excel 大约需要 2 分钟,其余时间代表循环和突出显示。

作为说明 - 我已经编写了条件格式的代码,并且运行速度非常快。主要问题是我无法将带有突出显示的单元格的工作表复制/粘贴到新工作表中,同时留下条件。我很想知道是否有人找到了操纵那个雷区的方法。

代码如下:

DoCmd.SetWarnings False

            Dim xlapp As Excel.Application
            Dim xlbook As Excel.Workbook
            Dim xlSheet, xlSheetPre, xlSheetPost As Excel.Worksheet
            Dim SQL As String
            Dim rs1 As DAO.Recordset
            Dim iSheet As Long, iRow As Long, iCol As Long, cols As Long
            Dim MaxLastRow As Long, MaxLastCol As Long
            Dim LastRow1 As Range, LastRow2 As Range
            Dim LastCol1 As Range, LastCol2 As Range
            Dim i As Integer
            

            SQL = "SELECT * From Pre"
            
            Set rs1 = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
            Set xlapp = Excel.Application
                xlapp.Visible = True
            Set xlbook = xlapp.Workbooks.Add
                i = 1
            Do
                Set xlSheet = Sheets.Add(after:=Sheets(Sheets.Count))
                i = i + 1
            Loop Until i = 2 ' the number 2 represents how many sheets you want to add to the 
            workbook
            
            Set xlSheet = xlbook.Worksheets(1) ' Finds worksheet (1) and begins loading data from SQL 
            table above
            
            With xlSheet
            .Name = "Pre" ' Name the worksheet
            .Range("a1:am1").Font.Bold = True 'Converts headers in row 1 to 
             bold font
            .Range("A2").CopyFromRecordset rs1 'Copies all data from selected 
             table (SQL)into your worksheet
            .Range("a1").AutoFilter ' Adds filter to your columns
            .Cells.Columns.AutoFit ' Adjust worksheet column width to autofit 
             your data
            .Range("a1:am1").Interior.ColorIndex = 37 ' Changes color of cell
            ' This loop reads all headers in your access table and places 
             them on worksheet
            For cols = 0 To rs1.Fields.Count - 1
                .Cells(1, cols + 1).Value = rs1.Fields(cols).Name
            Next
            
            
            End With
            
            SQL = "SELECT * From Post"
            Set rs1 = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
            Set xlSheet = xlbook.Worksheets(2)
            
            With xlSheet
            .Name = "Post" ' Name the worksheet
            .Range("a1:am1").Font.Bold = True 'Converts headers in row 1 to 
             bold font
            .Range("A2").CopyFromRecordset rs1 'Copies all data from selected 
             table (SQL)into your worksheet
            .Range("a1").AutoFilter ' Adds filter to your columns
            .Cells.Columns.AutoFit ' Adjust worksheet column width to autofit 
             your data
            .Range("a1:am1").Interior.ColorIndex = 37 ' Changes color of cell
            ' This loop reads all headers in your access table and places 
             them on worksheet
            ' This loop reads all headers in your access table and places them on worksheet
            For cols = 0 To rs1.Fields.Count - 1
            .Cells(1, cols + 1).Value = rs1.Fields(cols).Name
            Next

            
            End With
            
            Set xlSheetPre = xlbook.Worksheets(1)
            Set xlSheetPost = xlbook.Worksheets(2)
            
            Set LastRow1 = xlSheetPre.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious)
            Set LastRow2 = xlSheetPost.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious)
            
            If Not LastRow1 Is Nothing Then
                If Not LastRow2 Is Nothing Then
                    If LastRow1.Row > LastRow2.Row Then
                        MaxLastRow = LastRow1.Row
                    Else
                        MaxLastRow = LastRow2.Row
                    End If
                Else
                    MaxLastRow = LastRow1.Row
                End If
            Else
                MaxLastRow = LastRow2.Row
            End If
            
            Set LastCol1 = xlSheetPre.Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious)
            Set LastCol2 = xlSheetPost.Cells.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious)
            
            If Not LastCol1 Is Nothing Then
                If Not LastCol2 Is Nothing Then
                    If LastCol1.Column > LastCol2.Column Then
                        MaxLastCol = LastCol1.Column
                    Else
                        MaxLastCol = LastCol2.Column
                    End If
                Else
                    MaxLastCol = LastCol1.Column
                End If
            Else
                MaxLastCol = LastCol2.Column
            End If
            
            For iRow = 2 To MaxLastRow 'starting loop on row 2
                For iCol = 4 To MaxLastCol 'starting loop on column 4
                    If xlSheetPre.Cells(iRow, iCol).Value <> xlSheetPost.Cells(iRow, iCol).Value Then
                    xlSheetPre.Cells(iRow, iCol).Interior.ColorIndex = 4
                    xlSheetPost.Cells(iRow, iCol).Interior.ColorIndex = 4
            
                    End If
                    
                Next iCol
            Next iRow
            
            SubExit:
            On Error Resume Next
            
            rs1.Close
            Set rs1 = Nothing
            DoCmd.SetWarnings True
            
            Exit Sub

【问题讨论】:

您的代码只将标题放在 Excel 工作表中?数据在哪里?如果您将数据加载到数组中并比较数组,您的代码将运行得更快:逐个单元格的数据访问相对较慢。 也不清楚LastRow1 等可能是Nothing - 您已经将标题写入工作表,所以这永远不会发生。 感谢您指出这一点。我遗漏了我的格式行,看起来我最初删除了太多。对于 LastRow1/2,我的想法是说如果 LastRow1 与 LastRow2 不同,请调整具有较少的工作表,以便工作表之间的所有比较都是 1 比 1。 pre 和 post 表是否共享一个公共主键? 是的,A 列在导出到 Excel 之前在 Access 中具有对齐的键(a1 on pre = a1 on post)。如果 Pre 有一个 post 没有的键,则 Pre 键将突出显示。 【参考方案1】:

尝试通过仅提取有差异的记录来减少必须比较的记录数量。在 SQL 中有几种方法可以做到这一点,但作为概念证明,这会依次比较每一列,创建一个临时键表,用于过滤提取的记录。

Option Compare Database
Option Explicit

Sub DumpToExcel()

    Dim n As Integer, SQL As String, fname
    ' field names
    fname = Array("", "F1", "F2", "F3", "F4", "F5", _
                  "F6", "F7", "F8", "F9", "F10")

    ' identify diff records
    Debug.Print UBound(fname)
    DoCmd.SetWarnings False
    For n = 1 To UBound(fname)
        If n = 1 Then ' create table
            SQL = " SELECT post.ID, """ & n & """ AS Col INTO tmp"
        Else
            SQL = " INSERT INTO tmp" & _
                  " SELECT post.ID, """ & n & """ AS Col"
        End If
        SQL = SQL & _
            " FROM Post LEFT JOIN pre ON Post.id = pre.id" & _
            " WHERE NZ([pre].[" & fname(n) & "],"")<>NZ([post].[" & fname(n) & "],"");"
        
        DoCmd.RunSQL SQL

    Next
    DoCmd.SetWarnings True

    ' extract data
    Dim rs1 As DAO.Recordset

    SQL = " SELECT * FROM pre" & _
          " WHERE (((pre.[ID]) In " & _
          " (SELECT DISTINCT(ID) FROM tmp )));"

    Set rs1 = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)

    ' create excel
    Dim xlapp As Excel.Application, xlBook As Excel.Workbook
    Dim xlSheet As Excel.Worksheet

    Set xlapp = Excel.Application
    xlapp.Visible = True
    Set xlBook = xlapp.Workbooks.Add
    
    'add sheets as required
    Do While xlBook.Sheets.Count < 2
        xlBook.Sheets.Add
    Loop

    ' copy recordset to sheet
    xlBook.Sheets(1).Range("A2").CopyFromRecordset rs1
    MsgBox "Done"

End Sub

【讨论】:

谢谢。我快速复制了编码,但弹出框显示“输入参数值 - “Post.ID”,然后是“pre.id”。我将查看这段代码,看看我是否能理解它并让你知道的。 @Erik 将 ID 更改为您的主键字段【参考方案2】:

这是一个基于数组的比较。

已编译但未测试:

Sub Tester()
    
    Dim xlapp As Excel.Application
    Dim xlbook As Excel.Workbook
    Dim xlSheet, wsPre As Excel.Worksheet, wsPost As Excel.Worksheet
    Dim rowsPost As Long, rowsPre As Long, rowsMax As Long
    Dim colsPre As Long, colsPost As Long, colsMax As Long, flag As Boolean
    Dim r As Long, c As Long, rngPre As Range, rngPost As Range, arrPre, arrPost
    
    DoCmd.SetWarnings False
    
    Set xlapp = New Excel.Application 'forgot "New" here?
    xlapp.Visible = True
    Set xlbook = xlapp.Workbooks.Add()
    Do While xlbook.Worksheets.Count < 2 'how many sheets you need in the Workbook
        xlbook.Sheets.Add
    Loop
    
    Set wsPre = xlbook.Worksheets(1)
    Set wsPost = xlbook.Worksheets(2)
    
    PutInWorksheet "SELECT * From Pre", wsPre, "Pre"
    PutInWorksheet "SELECT * From Post", wsPost, "Post"
    
    Set rngPre = wsPre.Range("A1").CurrentRegion   'data ranges
    Set rngPost = wsPost.Range("A1").CurrentRegion

    arrPre = rngPre.Value   'read data to arrays
    arrPost = rngPost.Value
    
    rowsPre = UBound(arrPre, 1) 'compare array bounds...
    rowsPost = UBound(arrPost, 1)
    rowsMax = xlapp.Max(rowsPre, rowsPost)
    
    colsPre = UBound(arrPre, 2)
    colsPost = UBound(arrPost, 2)
    colsMax = xlapp.Max(colsPre, colsPost)
    
    For r = 2 To rowsMax
        flag = (r > rowsPre) Or (r > rowsPost) 'flag whole row if have run out of data in one set...
        If flag Then
            FlagRanges rngPre.Cells(r, 1).Resize(1, colsMax), _
                       rngPost.Cells(r, 1).Resize(1, colsMax)
        Else
            'have two rows to compare
            For c = 1 To colsMax
                flag = (c > colsPre) Or (c > colsPost) 'run out of cols in one dataset?
                If Not flag Then
                    flag = arrPre(r, c) <> arrPost(r, c) 'compare data
                End If
                If flag Then
                    'no data to compare, or data does not match
                    FlagRanges rngPre.Cells(r, c), rngPost.Cells(r, c)
                End If
            Next c
        End If
    Next r
End Sub

Sub FlagRanges(rng1 As Excel.Range, rng2 As Excel.Range)
    Const CLR_INDX = 4
    rng1.Interior.ColorIndex = CLR_INDX
    rng2.Interior.ColorIndex = CLR_INDX
End Sub

'run a query and put the results on a worksheet starting at A1
Sub PutInWorksheet(SQL As String, ws As Excel.Worksheet, _
    Optional newName As String = "")
    Dim f, c As Excel.Range, rs As dao.Recordset
    
    If Len(newName) > 0 Then ws.Name = newName
    Set rs = CurrentDb.OpenRecordset(SQL, dbOpenSnapshot)
    Set c = ws.Range("A1")
    For Each f In rs.Fields
        c.Value = f.Name
        c.Font.Bold = True
    Next f
    ws.Range("A2").CopyFromRecordset rs
    rs.Close
End Sub

【讨论】:

谢谢你,蒂姆。一个问题,当我编译时,我被标记为编译错误:“找不到方法或数据成员。”对于“rowsMax = Application.Max(rowsPre, rowsPost)”。这是一个工具>参考问题吗? 更具体地说,.Max 正在出错。 对不起,我是用 excel 写的,所以你需要 xlapp.Max 很好,xlApp.Max 解决了编译错误。看来我在 Sub PutInWorksheet 中又遇到了一个问题:“For each f In rs1.fields”表示需要一个对象。 对于 rs.fields 中的每个 f【参考方案3】:

“我的问题 - 使用数组或字典函数会加快处理速度吗?”

根据经验,答案是:不,不会。原因是您必须首先读取工作表中的单元格以填充数组或字典,所以...循环是真的,您需要组织数据(通常通过对列表、表格、范围等)以尽量减少对匹配记录(行)的搜索,从而使您的循环运行得更快。

如果您在 Access 中,那么您可以直接使用记录集执行此操作,前提是您公司的网络安全不会干扰记录集对象的移动(我的确实会干扰,而且非常严重 - Tanium 是一个真正的威胁!)

【讨论】:

根据我的经验 - 如果您执行嵌套循环来比较记录 - 使用字典会快得多。加载字典确实有一些开销,但避免嵌套循环是非常值得的。 @TimWilliams:如何在字典中存储具有多个字段和多种数据类型的记录? 使用字典和数组有什么优势吗? @Erik_U -- 它们是不同的,这取决于您需要什么以及数据是什么;谷歌,请你自己看看。 您可以将任何数据类型存储在字典条目的 Value 中,但在这种“避免嵌套循环”场景中使用时,它通常只是指向数组中行号的行索引跨度>

以上是关于比较两张纸并突出显示每张纸上的差异 - 循环是唯一的方法吗?的主要内容,如果未能解决你的问题,请参考以下文章

在两张纸上匹配三列并从匹配中粘贴特定列

Word怎样在一张纸上放多张图片

在两张纸上找到匹配的行

循环显示2列并突出显示无序值

将每个唯一值从一张纸复制并粘贴到另一张纸上

我试图将一张纸上的错误打印在另一张纸上,但无法弄清楚如何