VB.NET 按两列分组并将结果写入数组
Posted
技术标签:
【中文标题】VB.NET 按两列分组并将结果写入数组【英文标题】:VB.NET Group by two columns and write results to an array 【发布时间】:2021-03-31 04:19:18 【问题描述】:我需要按列值将 csv 数据分组到新的 csv。我只能通过一栏来做到这一点,但不幸的是这还不够,因为我得到了重复而没有达到我的目标。这是我的 csv 示例,大约有 50 列,最后是我输入 csv 中的 column(29):
603;10453;2.12.2020;88,69
603;10453;2.12.2020;88,69
603;10453;4.12.2020;72,69
605;10441;3.12.2020;39,51
605;10441;8.12.2020;25,85
605;10441;9.12.2020;52,91
605;10441;10.12.2020;66,31
605;10441;10.12.2020;66,31
606;10453;11.12.2020;72,69
606;10453;11.12.2020;72,69
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;3.12.2020;165,29
607;11202;3.12.2020;165,29
607;11202;3.12.2020;165,29
607;11202;3.12.2020;165,29
607;11202;4.12.2020;75,87
607;11202;5.12.2020;123,24
607;11202;5.12.2020;123,24
607;11202;5.12.2020;123,24
607;11202;7.12.2020;88,69
607;11202;7.12.2020;88,69
这是我的代码,我按最后一列对值进行分组:
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
Dim inputFile = "input.csv"
Dim outputFile = "output.csv"
IO.File.WriteAllLines(outputFile, IO.File.ReadLines(inputFile).
Select(Function(x) x.Split(";"c)).
GroupBy(Function(x) x(0), x(3)).
Select(Function(x)
Return String.Format(
"0;1;2;3",
x.Select(Function(y) y(0)).First,
x.Select(Function(y) y(1)).First,
x.Select(Function(y) y(2)).First,
x.Select(Function(y) y(3)).First)
End Function).ToArray)
End Sub
正如您在最后一列中看到的重复值,我需要按两个键对这个文件进行分组,其中一个是 column(0) 或 column(1) 值,第二个是 column(3)。但是我不知道如何使用我的代码来做到这一点。 Desiret outout 文件必须如下所示:
603;10453;2.12.2020;88,69
603;10453;4.12.2020;72,69
605;10441;3.12.2020;39,51
605;10441;8.12.2020;25,85
605;10441;9.12.2020;52,91
605;10441;10.12.2020;66,31
606;10453;11.12.2020;72,69
607;11202;1.12.2020;250,98
607;11202;2.12.2020;274,02
607;11202;3.12.2020;165,29
607;11202;4.12.2020;75,87
607;11202;5.12.2020;123,24
607;11202;7.12.2020;88,69
通常,如果 column(0) 和 column(2) 匹配,我必须删除重复项。
感谢您的帮助!
【问题讨论】:
显示输出文件的样子。 【参考方案1】:我会使用面向对象的方法。在每一行周围创建一个包装对象,用于解析并为每个值提供属性,然后根据需要对结果进行分组(我再次选择了带有相等比较器和distinct
的面向对象方法)。
由于我不知道这些列的含义,我只是假设了一些东西:OrderNo
、CustomerNo
、OrderDate
和 Value
。
这是包装类的代码:
Private Class Record
'Constructors
Public Sub New(lineNo As Int32, line As String)
Const expectedColumnCount As Int32 = 4
Const delimiter As String = ";"
If (lineNo < 1) Then Throw New ArgumentOutOfRangeException(NameOf(lineNo), lineNo, "The line number must be positive!")
If (line Is Nothing) Then Throw New ArgumentNullException(NameOf(line))
Dim tokens As String() = Split(line, delimiter, expectedColumnCount + 1, CompareMethod.Binary)
If (tokens.Length <> expectedColumnCount) Then Throw New ArgumentException($"Line lineNo: Invalid data row! expectedColumnCount 'delimiter'-delimitered columns expected.")
Me.Tokens = tokens
End Sub
'Public Properties
Public ReadOnly Property OrderNo As String
Get
Return Tokens(0)
End Get
End Property
Public ReadOnly Property CustomerNo As String
Get
Return Tokens(1)
End Get
End Property
Public ReadOnly Property OrderDate As String
Get
Return Tokens(2)
End Get
End Property
Public ReadOnly Property Value As String
Get
Return Tokens(3)
End Get
End Property
'Private Properties
Private ReadOnly Property Tokens As String()
End Class
这是进行分组的比较器:
Private Class RecordComparer
Implements IEqualityComparer(Of Record)
Private Sub New()
End Sub
Public Shared ReadOnly Property Singleton As New RecordComparer()
Public Function Equals(x As Record, y As Record) As Boolean Implements IEqualityComparer(Of Record).Equals
If (Object.ReferenceEquals(x, y)) Then Return True
If (x Is Nothing) OrElse (y Is Nothing) Then Return False
Return Comparer.Equals(x.OrderNo, y.OrderNo) AndAlso Comparer.Equals(x.CustomerNo, y.CustomerNo) AndAlso Comparer.Equals(x.Value, y.Value)
End Function
Public Function GetHashCode(obj As Record) As Integer Implements IEqualityComparer(Of Record).GetHashCode
If (obj Is Nothing) Then Return 42
Return Comparer.GetHashCode(obj.OrderNo) Xor Comparer.GetHashCode(obj.CustomerNo) Xor Comparer.GetHashCode(obj.Value)
End Function
Private Shared ReadOnly Comparer As IEqualityComparer(Of String) = StringComparer.Ordinal
End Class
最后是用法:
'Convert input lines to simple objects
Dim i As Int32 = 1
Dim dataRows As New List(Of Record)()
For Each line As String In File.ReadLines(inputFile)
Dim data As New Record(i, line)
dataRows.Add(data)
i += 1
Next
'Group by the 3 columns (the DateTime is kind of random, no guarantee which object wins)
Dim consolidatedRows As IEnumerable(Of Record) = dataRows.Distinct(SimpleInputDataComparer.Singleton)
'Convert and export lines
Dim outputLines As IEnumerable(Of String) = consolidatedRows.Select(Function(e) $"e.OrderNo;e.CustomerNo;e.OrderDate;e.Value")
File.WriteAllLines(outputFile, outputLines)
【讨论】:
【参考方案2】:我搞定了。为了我的目标,我使用了 Christoph 的例子。最后我的代码如下所示:
Public Class TempClass
Public Property ID As String
Public Property day As String
Public Property OriginalStr As String
End Class
Public Class TempIDComparer
Implements IEqualityComparer(Of TempClass)
Private Function IEqualityComparer_Equals(x As TempClass, y As TempClass) As Boolean Implements IEqualityComparer(Of TempClass).Equals
If ReferenceEquals(x, y) Then
Return True
End If
If ReferenceEquals(x, Nothing) OrElse ReferenceEquals(y, Nothing) Then
Return False
End If
Return x.ID = y.ID AndAlso x.day = y.day
End Function
Private Function IEqualityComparer_GetHashCode(obj As TempClass) As Integer Implements IEqualityComparer(Of TempClass).GetHashCode
If obj Is Nothing Then Return 0
Return obj.ID.GetHashCode()
End Function
End Class
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
Dim inputFile = "input.csv"
Dim outputFile = "output.csv"
Dim list As List(Of TempClass) = New List(Of TempClass)()
Dim ls As List(Of String()) = New List(Of String())()
Dim fileReader As StreamReader = New StreamReader(inputFile)
Dim strLine As String = ""
While strLine IsNot Nothing
strLine = fileReader.ReadLine()
If strLine IsNot Nothing AndAlso strLine.Length > 0 Then
Dim t As TempClass = New TempClass() With
.ID = strLine.Split(";"c)(0),
.day = strLine.Split(";"c)(3),
.OriginalStr = strLine
list.Add(t)
End If
End While
fileReader.Close()
Dim tempList = list.Distinct(New TempIDComparer())
Dim fileWriter As StreamWriter = New StreamWriter(outputFile, False, System.Text.Encoding.Default)
For Each item In tempList.ToList()
fileWriter.WriteLine(item.OriginalStr)
Next
fileWriter.Flush()
fileWriter.Close()
End Sub
【讨论】:
以上是关于VB.NET 按两列分组并将结果写入数组的主要内容,如果未能解决你的问题,请参考以下文章