我需要计算行的实例并根据多个列值删除重复项

Posted

技术标签:

【中文标题】我需要计算行的实例并根据多个列值删除重复项【英文标题】:I need to count instances of rows and delete duplicates based on multiple column values 【发布时间】:2018-01-18 01:11:29 【问题描述】:

基表

成品

因此,我正在努力将 CSV 导出排序为一种格式,使我自己和我部门的人员能够快速将信息复制并粘贴到已经存在的工作簿中。现有工作簿运行多个公式和代码,因此我不能只使用 CSV 导出自动执行的格式创建新工作簿。基本上我需要获取具有多列标识符的多行信息并对这些行进行计数/求和并消除重复项,但我需要该行在其中的列中包含所有相应的信息。我已经尝试过标准的 Excel 公式,我可以得到小计或删除和总和,但它不会将其余信息带入其中。

因此,检查匹配重复项的最终信息顺序是 SKU、Floor Lvl、Detail、Room、Lable

感谢您提供的任何帮助!

【问题讨论】:

这就是数据透视表的作用。无需公式。 如果您能告诉我如何获取数据透视表,以成品链接格式向我显示信息,那很好,但此时数据透视表不起作用我需要它做什么。它汇总了房间的所有物品或带有该标签的所有物品。我需要带有该标签的那个房间的所有项目,然后我需要它来单独为那个房间计算下一个标签。 exp:1x8 用于 3932 房间 = 6,1x7 用于 3932 房间 =3 不是 3932 房间=9 或 1x8 = 10 【参考方案1】:

正如@teylyn 建议的那样,数据透视表是可行的方法:

选择您的数据,包括标题

Insert > Pivot Table

在“行标签”框中,按照“标签”在顶部然后“样式”然后是“SKU”的顺序删除所有字段......除了“计数”

删除“值”框中的“计数”字段并将其设置为“计数总和”

PivotTable Tools > Design > Report Layout > Show in Tabular Form

PivotTable Tools > Design > Report Layout > Repeat All Item Labels

PivotTable Tools > Design > Grand Totals > Off for Rows and Columns

PivotTable Tools > Design > Subtotals > Do Not Show Subtotals

我得到的结果与您的“成品”相同。

【讨论】:

AmBo,成功了!谢谢!我之前尝试过使用数据透视表,但我不知道如何格式化。现在我可以展示我的船员了!【参考方案2】:

根据现有的 cmets/answers,数据透视表可能是要走的路。但也许下面对你来说也可以(假设它有效)。您需要分配 PathToCSV。

Option explicit

Sub GroupCSVbyColumns()

Dim PathToCSV as string
PathToCSV = "C:\New Folder\ff.csv" 'Replace with actual path.'

If len(dir(PathToCSV)) >0 then

Dim ContentsOfCSV as string

Open PathToCSV for binary access read as #1
ContentsOfCSV = space$(lof(1))
Get #1,1, ContentsOfCSV ' Assumes file will fit in memory'
Close #1

Dim RowsInCSV() as string
RowsInCSV = split(ContentsOfCSV, vbNewline, -1, vbbinarycompare) ' Assumes rows are separated by new line character'

Const COMMA_DELIMITER as string = ","
Dim RowIndex as long

Dim OutputList() as string
Dim OutputCounts() as long

Redim OutputList(lbound(RowsInCSV) to ubound(RowsInCSV))

Redim OutputCounts(lbound(RowsInCSV) to ubound(RowsInCSV))

' "So final order of info to check if matched duplicates would be SKU, Floor Lvl, Detail, Room, Lable"
Not sure if it makes a difference in your case, but code below considers every column (apart from ' Count') when determining duplicates -- not just the ones you mentioned.'

Dim MatchResult as variant
Dim MatchesCount as long: MatchesCount = lbound(OutputList) 'this assignment ensures we leave the first element blank and reserved for header row, as we increment MatchCount first.
Dim CurrentRowText as string
Dim CurrentRowCount as long

For RowIndex = (lbound(RowsInCSV)+1) to ubound(RowsInCSV) ' Skip row of headers'

If len(RowsInCSV(RowIndex))>0 then

CurrentRowText = left$(RowsInCSV(RowIndex),instrrev(RowsInCSV(RowIndex),comma_delimiter,-1, vbbinarycompare)-1)

CurrentRowCount = clng(mid$(RowsInCSV(RowIndex),1+instrrev(RowsInCSV(RowIndex),comma_delimiter,-1, vbbinarycompare)))

' Filter function might perform better than Match below. '
MatchResult = application.match(CurrentRowText, OutputList,0)

If isnumeric(MatchResult) then
OutputCounts(clng(MatchResult)) = OutputCounts(clng(MatchResult)) + CurrentRowCount

Else
MatchesCount = MatchesCount + 1
OutputList(MatchesCount) = CurrentRowText
OutputCounts(MatchesCount) = OutputCounts(MatchesCount) + CurrentRowCount

End if

End if

Next RowIndex

Dim TemporaryArray() as string
Dim ColumnIndex as long

TemporaryArray = split(RowsInCSV(lbound(RowsInCSV)),comma_delimiter,-1, vbbinarycompare)

Dim OutputTable(1 to (MatchesCount+1), 1 to (ubound(TemporaryArray)+1))

' Assign all headers from header row; done outside of loop below as all columns are looped through.'
For ColumnIndex = lbound(OutputTable,2) to (ubound(OutputTable,2))
OutputTable(1,ColumnIndex) = TemporaryArray(ColumnIndex-1)
Next ColumnIndex

For RowIndex = (lbound(OutputTable,1)+1) to ubound(OutputTable,1)

TemporaryArray = split(OutputList(rowindex-1),comma_delimiter,-1, vbbinarycompare)

For ColumnIndex = lbound(OutputTable,2) to (ubound(OutputTable,2)-1)

OutputTable(RowIndex,ColumnIndex) = TemporaryArray(ColumnIndex-1)

Next ColumnIndex

OutputTable(RowIndex,ColumnIndex) = OutputCounts(RowIndex-1)

Next RowIndex

Dim OutputSheet as worksheet
Set OutputSheet = Thisworkbook.worksheets.add
 OutputSheet.range("A1").resize(ubound(OutputTable,1),ubound(OutputTable,2)).value2 = OutputTable

Else
Msgbox("No file found at " & PathToCSV)
End if

End sub

未经测试,在移动设备上编写。

【讨论】:

Chillin,感谢您抽出宝贵时间,AmBo 帮我解决了问题!

以上是关于我需要计算行的实例并根据多个列值删除重复项的主要内容,如果未能解决你的问题,请参考以下文章

删除pandas数据帧中的重复项后,替换特定的列值

R - 识别并删除重复行的一个实例[重复]

根据一列删除重复项并根据另一表中的数据进行过滤

pyspark 根据列值删除重复行

从表中的多个重复项中删除特定记录

根据列值删除Python Pandas中的DataFrame行[重复]