如何批量替换Excel中的重复项?
Posted
技术标签:
【中文标题】如何批量替换Excel中的重复项?【英文标题】:How to batch replace duplicates in Excel? 【发布时间】:2021-09-10 05:10:54 【问题描述】:如何批量替换现有数据中的重复文本字符串? 我有一个 Excel 文件,其中包含 order#、PersonName 和 ID,一些(不同的)order# 共享一组相同的 PersonName 和 ID,必须用其他唯一的 PersonName 和 ID 替换(对于相同的订单也可以#具有相同的 PersonName 和 ID,但不同的 order# 必须具有不同的 PersonName 和 ID)。我有另一个现有数据表,其中包含用于此替换目的的 PersonName 和 ID 列表,但是如何编写一个公式以使用此现有数据表自动替换不同 order# 上的所有重复的 PersonName 和 ID?不应删除重复项,而只能用唯一的字符串值替换。
样本数据(所有数据都是虚幻的):
空缺名称和 ID 示例:
结果将用第一个可用的空名称和 ID 替换重复项(相同的订单号被分配相同的名称和 ID 集):
我真的是这方面的新手,花了很多时间寻找可能类似的问题并试图找出解决方案,但无法提出解决方案。 :(
【问题讨论】:
这个问题不是很清楚,尤其是没有样本数据和解释结果。也不清楚您尝试了什么以及卡在哪里。 “所有数据都是虚幻的”是什么意思?你能把问题说得更简洁一点吗? @sundqvist。发布示例数据时,以文本形式发布,这样我们就不必重新输入它来测试解决方案。 因为,您已经标记了 VBA .. 我不认为 VBA 是必需的。只需过滤“David Erent”上的数据,然后将 Order# 复制到新工作表。从那里删除重复项,然后将这些唯一的 Order# 复制到 Vacant Names & ID 表作为第一列。然后,您可以使用数据中第二个 David Erent 行中的 VLOOKUP 从空缺名称和 ID 表中简单地获取人名和 ID 到您的数据表中。在此期间不要对任何工作表进行排序。因此你会得到 Maria Baker 反对 551872 等等。 【参考方案1】:试试下面的 vba 代码。 A 列 = Order#,B 列 = PersonName,D 列 = ID,空缺名称 = E 列,F 列 = 空 ID。
Sub SortData()
Dim Arr01 As Variant
Dim Arr02 As Variant
Dim i01 As Integer 'Temp int number
Dim i02 As Integer 'Temp int number
Dim i03 As Integer 'Temp int number
Dim i04 As Integer 'Temp int number
'Array of the "Order#, PersonName, ID"
Arr01 = Range("A2", Range("C2").End(xlDown))
'Array of the "Vacant name, Vacant ID"
Arr02 = Range("E2", Range("F2").End(xlDown))
i03 = 1
For i01 = 2 To UBound(Arr01, 1)
For i02 = 1 To UBound(Arr01, 1)
If i02 = i01 Then
i02 = 1
GoTo line1
End If
If Arr01(i02, 2) = Arr01(i01, 2) And Arr01(i02, 1) <> Arr01(i01, 1) Then
Arr01(i01, 2) = Arr02(i03, 1)
Arr01(i01, 3) = Arr02(i03, 2)
i03 = 1 + i03
End If
For i04 = i01 + 1 To UBound(Arr01, 1)
If Arr01(i01, 1) = Arr01(i04, 1) Then
Arr01(i04, 2) = Arr02(i01, 1)
Arr01(i04, 3) = Arr02(i01, 2)
End If
Next i04
If i02 = i01 Then
i02 = 1
GoTo line1
End If
Next i02
line1:
Next i01
For i01 = i02 To UBound(Arr01, 1)
For i02 = 1 To UBound(Arr01, 1)
If Arr01(i01, 1) = Arr01(i02, 1) And Arr01(i01, 2) = Arr01(i02, 2) Then
Arr01(i01, 3) = Arr01(i02, 3)
End If
Next i02
Next i01
Range("A2", Range("C2").End(xlDown)) = Arr01
End Sub
【讨论】:
【参考方案2】:批量替换重复项(双字典)
假设两个表格(带有一行标题的连续范围)都从不同工作表上的单元格A1
开始。
调整常量部分中的值。
Option Explicit
Sub BatchReplaceDupes()
' Source (read from)
Const sName As String = "Sheet2"
Const snCol As Long = 1 ' Name Column
Const siCol As Long = 2 ' ID Column
' Destination, Result (written to (also read from))
Const dName As String = "Sheet1"
Const doCol As Long = 1 ' Order Column
Const dnCol As Long = 2 ' Name Column
Const diCol As Long = 3 ' ID Column
' Workbook
Dim wb As Workbook: Set wb = ThisWorkbook ' workbook containing this code
' Write the values from the Source Range to the Source Array.
Dim sws As Worksheet: Set sws = wb.Worksheets(sName)
Dim srg As Range: Set srg = sws.Range("A1").CurrentRegion
Dim sData As Variant: sData = srg.Value
' Write the values from the Destination Range to the Destination Array.
Dim dws As Worksheet: Set dws = wb.Worksheets(dName)
Dim drg As Range: Set drg = dws.Range("A1").CurrentRegion
Dim dData As Variant: dData = drg.Value
' Create references to the Order and Name Dictionaries.
Dim dictO As Object: Set dictO = CreateObject("Scripting.Dictionary")
Dim dictN As Object: Set dictN = CreateObject("Scripting.Dictionary")
' Declare variables.
Dim KeyO As Variant ' Order
Dim KeyN As Variant ' Name
Dim KeyI As Variant ' ID
Dim dr As Long
Dim dc As Long
Dim sr As Long: sr = 1 ' 1 is headers
' (Over)Write results to the Destination Array.
For dr = 2 To UBound(dData, 1) ' 2, because 1 is headers
KeyO = dData(dr, doCol)
KeyN = dData(dr, dnCol)
KeyI = dData(dr, diCol)
If Not dictO.Exists(KeyO) Then
If dictN.Exists(KeyN) Then
sr = sr + 1
dictO(KeyO) = sData(sr, snCol)
dictN(KeyN) = sData(sr, siCol)
Else
dictO(KeyO) = KeyN
dictN(KeyN) = KeyI
End If
End If
dData(dr, dnCol) = dictO(KeyO)
dData(dr, diCol) = dictN(KeyN)
Next dr
' Write values from the Destination Array to the Destination Range.
drg.Value = dData
End Sub
【讨论】:
【参考方案3】:您可以使用Power Query
获得所需的输出,在 Windows Excel 2010+ 和 Office 365 Excel 中可用
Data => Get&Transform => From Table/Range
当 PQ UI 打开时,导航到Home => Advanced Editor
记下代码第 2 行中的表名(以及标记为 Source2
的行的更下方)
用下面的M-Code替换现有代码
将粘贴代码的Source
和Source2
行中的表名更改为您的“真实”表名
检查所有 cmets 以及 Applied Steps
窗口,以更好地了解算法和步骤
基本算法
读入sampleData表
读入 vacantName 表以备后用
按 ID 分组
对于上一步中的每个子表,按顺序分组#
然后为每个子子表添加一个从0开始的索引列创建一个“查找索引”列表,该列表对 sampledata 表中的每个非零条目递增 1。 我们将使用它来确定替换哪个空缺名称
如果索引为 0
,则使用第二个表创建 ID 和 PersonName 列M 码
let
//change Table name in next line to actual name in your workbook
Source = Excel.CurrentWorkbook()[Name="sampleData"][Content],
//Read in the vacant name list
//Again, change Table name in next line to actual name in your workbook
Source2 = Excel.CurrentWorkbook()[Name="vacantNames"][Content],
//Group by ID
#"Grouped Rows" = Table.Group(Source, "ID",
"All", each _, type table [#"Order#"=number, PersonName=text, ID=text]
),
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows","ID"),
//Group each ID by Order#
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Custom", each Table.Group([All],"Order#","All2", each _)),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom","All"),
//add index column to each subTable
#"Added Custom1" = Table.AddColumn(#"Removed Columns1", "Custom.1", each Table.AddIndexColumn([Custom],"IDX",0,1,Int64.Type)),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom1","Custom"),
//expand tables
//we will replace names that do not have IDX=0
#"Expanded Custom.1" = Table.ExpandTableColumn(#"Removed Columns2", "Custom.1",
"Order#", "All2", "IDX", "Order#", "All2", "IDX"),
//create index into vacant table
vacIDX = List.Generate(()=>[vIDX=0, i=0],
each [i] < Table.RowCount(#"Expanded Custom.1"),
each [vIDX = if Table.Column(#"Expanded Custom.1","IDX")[i]=0 then [vIDX] else [vIDX]+1, i = [i]+1],
each [vIDX]),
//Add index column to sample table
#"Added Index1" = Table.AddIndexColumn(#"Expanded Custom.1", "Index", 0, 1, Int64.Type),
//add Personal Names and ID
//The Index column gives us the position in the vacIDX list that corresponds to the "next" entry for substitution
// into the sample table
#"Added Custom2" = Table.AddColumn(#"Added Index1", "PersonName",
each if [IDX] = 0 then [All2][PersonName]0
else Source2[PersonName]vacIDX[Index], Text.Type),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "ID",
each if [IDX] = 0 then [All2][ID]0
else Source2[ID]vacIDX[Index], Text.Type),
//Remove unneeded columns and expand the table
#"Removed Columns3" = Table.RemoveColumns(#"Added Custom3","Order#", "IDX", "Index"),
#"Expanded All2" = Table.ExpandTableColumn(#"Removed Columns3", "All2", "Order#", "Order#")
in
#"Expanded All2"
结果
【讨论】:
以上是关于如何批量替换Excel中的重复项?的主要内容,如果未能解决你的问题,请参考以下文章