VB.NET 按两列分组并将结果写入数组

Posted

技术标签:

【中文标题】VB.NET 按两列分组并将结果写入数组【英文标题】:VB.NET Group by two columns and write results to an array 【发布时间】:2021-03-31 04:19:18 【问题描述】:

我需要按列值将 csv 数据分组到新的 csv。我只能通过一栏来做到这一点,但不幸的是这还不够,因为我得到了重复而没有达到我的目标。这是我的 csv 示例,大约有 50 列,最后是我输入 csv 中的 column(29):

603;10453;2.12.2020;88,69
603;10453;2.12.2020;88,69
603;10453;4.12.2020;72,69
605;10441;3.12.2020;39,51
605;10441;8.12.2020;25,85
605;10441;9.12.2020;52,91
605;10441;10.12.2020;66,31
605;10441;10.12.2020;66,31
606;10453;11.12.2020;72,69
606;10453;11.12.2020;72,69
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;1.12.2020;250,98
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;2.12.2020;274,02
607;11202;3.12.2020;165,29
607;11202;3.12.2020;165,29
607;11202;3.12.2020;165,29
607;11202;3.12.2020;165,29
607;11202;4.12.2020;75,87
607;11202;5.12.2020;123,24
607;11202;5.12.2020;123,24
607;11202;5.12.2020;123,24
607;11202;7.12.2020;88,69
607;11202;7.12.2020;88,69

这是我的代码,我按最后一列对值进行分组:

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
        Dim inputFile = "input.csv"
        Dim outputFile = "output.csv"


        IO.File.WriteAllLines(outputFile, IO.File.ReadLines(inputFile).
                        Select(Function(x) x.Split(";"c)).
                        GroupBy(Function(x) x(0), x(3)).
                    Select(Function(x)
                               Return String.Format(
                                "0;1;2;3",
                                x.Select(Function(y) y(0)).First,
                                x.Select(Function(y) y(1)).First,
                                x.Select(Function(y) y(2)).First,
                                x.Select(Function(y) y(3)).First)
                               End Function).ToArray)
    End Sub

正如您在最后一列中看到的重复值,我需要按两个键对这个文件进行分组,其中一个是 column(0) 或 column(1) 值,第二个是 column(3)。但是我不知道如何使用我的代码来做到这一点。 Desiret outout 文件必须如下所示:

603;10453;2.12.2020;88,69
603;10453;4.12.2020;72,69
605;10441;3.12.2020;39,51
605;10441;8.12.2020;25,85
605;10441;9.12.2020;52,91
605;10441;10.12.2020;66,31
606;10453;11.12.2020;72,69
607;11202;1.12.2020;250,98
607;11202;2.12.2020;274,02
607;11202;3.12.2020;165,29
607;11202;4.12.2020;75,87
607;11202;5.12.2020;123,24
607;11202;7.12.2020;88,69

通常,如果 column(0) 和 column(2) 匹配,我必须删除重复项。

感谢您的帮助!

【问题讨论】:

显示输出文件的样子。 【参考方案1】:

我会使用面向对象的方法。在每一行周围创建一个包装对象,用于解析并为每个值提供属性,然后根据需要对结果进行分组(我再次选择了带有相等比较器和distinct 的面向对象方法)。

由于我不知道这些列的含义,我只是假设了一些东西:OrderNoCustomerNoOrderDateValue

这是包装类的代码:

Private Class Record

    'Constructors
    Public Sub New(lineNo As Int32, line As String)
        Const expectedColumnCount As Int32 = 4
        Const delimiter As String = ";"
        If (lineNo < 1) Then Throw New ArgumentOutOfRangeException(NameOf(lineNo), lineNo, "The line number must be positive!")
        If (line Is Nothing) Then Throw New ArgumentNullException(NameOf(line))
        Dim tokens As String() = Split(line, delimiter, expectedColumnCount + 1, CompareMethod.Binary)
        If (tokens.Length <> expectedColumnCount) Then Throw New ArgumentException($"Line lineNo: Invalid data row! expectedColumnCount 'delimiter'-delimitered columns expected.")
        Me.Tokens = tokens
    End Sub

    'Public Properties

    Public ReadOnly Property OrderNo As String
        Get
            Return Tokens(0)
        End Get
    End Property

    Public ReadOnly Property CustomerNo As String
        Get
            Return Tokens(1)
        End Get
    End Property

    Public ReadOnly Property OrderDate As String
        Get
            Return Tokens(2)
        End Get
    End Property

    Public ReadOnly Property Value As String
        Get
            Return Tokens(3)
        End Get
    End Property

    'Private Properties

    Private ReadOnly Property Tokens As String()

End Class

这是进行分组的比较器:

Private Class RecordComparer
    Implements IEqualityComparer(Of Record)

    Private Sub New()
    End Sub

    Public Shared ReadOnly Property Singleton As New RecordComparer()

    Public Function Equals(x As Record, y As Record) As Boolean Implements IEqualityComparer(Of Record).Equals
        If (Object.ReferenceEquals(x, y)) Then Return True
        If (x Is Nothing) OrElse (y Is Nothing) Then Return False
        Return Comparer.Equals(x.OrderNo, y.OrderNo) AndAlso Comparer.Equals(x.CustomerNo, y.CustomerNo) AndAlso Comparer.Equals(x.Value, y.Value)
    End Function

    Public Function GetHashCode(obj As Record) As Integer Implements IEqualityComparer(Of Record).GetHashCode
        If (obj Is Nothing) Then Return 42
        Return Comparer.GetHashCode(obj.OrderNo) Xor Comparer.GetHashCode(obj.CustomerNo) Xor Comparer.GetHashCode(obj.Value)
    End Function

    Private Shared ReadOnly Comparer As IEqualityComparer(Of String) = StringComparer.Ordinal

End Class

最后是用法:

    'Convert input lines to simple objects
    Dim i As Int32 = 1
    Dim dataRows As New List(Of Record)()
    For Each line As String In File.ReadLines(inputFile)
        Dim data As New Record(i, line)
        dataRows.Add(data)
        i += 1
    Next
    'Group by the 3 columns (the DateTime is kind of random, no guarantee which object wins)
    Dim consolidatedRows As IEnumerable(Of Record) = dataRows.Distinct(SimpleInputDataComparer.Singleton)
    'Convert and export lines
    Dim outputLines As IEnumerable(Of String) = consolidatedRows.Select(Function(e) $"e.OrderNo;e.CustomerNo;e.OrderDate;e.Value")
    File.WriteAllLines(outputFile, outputLines)

【讨论】:

【参考方案2】:

我搞定了。为了我的目标,我使用了 Christoph 的例子。最后我的代码如下所示:

Public Class TempClass
        Public Property ID As String
        Public Property day As String
        Public Property OriginalStr As String
        End Class
Public Class TempIDComparer
            Implements IEqualityComparer(Of TempClass)

            Private Function IEqualityComparer_Equals(x As TempClass, y As TempClass) As Boolean Implements IEqualityComparer(Of TempClass).Equals
            If ReferenceEquals(x, y) Then
                Return True
            End If
            If ReferenceEquals(x, Nothing) OrElse ReferenceEquals(y, Nothing) Then
                Return False
            End If

            Return x.ID = y.ID AndAlso x.day = y.day
        End Function
            Private Function IEqualityComparer_GetHashCode(obj As TempClass) As Integer Implements IEqualityComparer(Of TempClass).GetHashCode
                If obj Is Nothing Then Return 0
                Return obj.ID.GetHashCode()
            End Function
        End Class

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
        Dim inputFile = "input.csv"
        Dim outputFile = "output.csv"


        Dim list As List(Of TempClass) = New List(Of TempClass)()
        Dim ls As List(Of String()) = New List(Of String())()
        Dim fileReader As StreamReader = New StreamReader(inputFile)
        Dim strLine As String = ""

        While strLine IsNot Nothing
            strLine = fileReader.ReadLine()

            If strLine IsNot Nothing AndAlso strLine.Length > 0 Then
                Dim t As TempClass = New TempClass() With 
                        .ID = strLine.Split(";"c)(0),
                        .day = strLine.Split(";"c)(3),
                        .OriginalStr = strLine
                    
                list.Add(t)
            End If
        End While

        fileReader.Close()
        Dim tempList = list.Distinct(New TempIDComparer())
        Dim fileWriter As StreamWriter = New StreamWriter(outputFile, False, System.Text.Encoding.Default)
        For Each item In tempList.ToList()
            fileWriter.WriteLine(item.OriginalStr)
        Next
        fileWriter.Flush()
        fileWriter.Close()
End Sub

【讨论】:

以上是关于VB.NET 按两列分组并将结果写入数组的主要内容,如果未能解决你的问题,请参考以下文章

在按两列分组时选择最大值,并在另一列上排序

SQL Server:按两列分组,并将第三列与两组的分叉相加

Python 按两列分组,然后获取最早和最晚日期

python按两列分组,按一个索引提取第一个元素

按两列排序,为啥不先分组呢?

将熊猫数据框按两列分组而不汇总