使用正则表达式从文本文件中解析 [重复]

Posted 2023-04-13

技术标签:

【中文标题】使用正则表达式从文本文件中解析 [重复]【英文标题】：Parsing with Regex out of text file [duplicate] 【发布时间】：2019-10-31 06:54:06 【问题描述】：

我有一个文本文件

    ;   Message Number
    ;   |         Time Offset (ms)
    ;   |         |        Type
    ;   |         |        |        ID (hex)
    ;   |         |        |        |     Data Length
    ;   |         |        |        |     |   Data Bytes (hex) ...
    ;   |         |        |        |     |   |
    ;---+--   ----+----  --+--  ----+---  +  -+ -- -- -- -- -- -- --
         1)         2.0  Rx         0400  8  01 5A 01 57 01 D2 A6 02 
         2)         8.6  Rx         0500  8  02 C1 02 C9 02 BE 02 C2 
         3)        36.2  Rx         0401  8  01 58 01 59 01 01 01 01 
         4)        41.7  Rx         01C4  8  27 9C 64 8C 00 03 E8 08 
         5)        43.1  Rx         0501  8  02 C0 02 C1 02 C6 02 C0 
         6)        62.7  Rx         01C2  8  27 9C 60 90 00 0F 04 08

我正在尝试仅从该文件中收集 ID。我有这个表达式并测试过它是否有效，但是当我尝试收集列表时，它给了我整行而不是 ID。

        var ofd = new OpenFileDialog
        
            Filter = "TRC File (*.trc*)|*.trc*",
            Multiselect = true,
        ;

        ofd.ShowDialog();

        string path = ofd.FileName;
        List<string> alllinesText = File.ReadAllLines(path).ToList();
        foreach (string id in alllinesText)
        
            Regex rx = new Regex(@"\d\d[\d|\w][\d|\w]\s\s");
            Console.Write(id.ToString());
            MatchCollection matches1 = rx.Matches(id);
            Console.WriteLine(matches1);

        

        foreach (string data in alllinesText)
        
            Regex rx2 = new Regex(@"[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w]");
            Console.Write(data.ToString());
            MatchCollection matches2 = rx2.Matches(data);

输出是

     28817)    347963.1  Rx         01C2  8  01 00 00 00 00 00 00 6F System.Text.RegularExpressions.MatchCollection
     28818)    347966.3  Rx         04E2  8  64 04 10 15 F5 00 00 08 System.Text.RegularExpressions.MatchCollection
     28819)    347967.2  Rx         01C4  8  27 14 63 8C 00 03 E7 08 System.Text.RegularExpressions.MatchCollection
     28820)    348017.0  Rx         03C4  8  7F 8A 7F 80 7F FA 96 0F System.Text.RegularExpressions.MatchCollection
     28821)    348023.1  Rx         0405  8  01 57 01 58 01 DB 93 02 System.Text.RegularExpressions.MatchCollection
     28822)    348029.6  Rx         0505  8  02 BB 02 BC 02 BD 02 BF System.Text.RegularExpressions.MatchCollection

【问题讨论】：

仅供参考：\d 已经包含在 \w 中，您可以通过将所有 [\w\d] 替换为简单的 \w 来简化您的正则表达式，此外，如果您想匹配六边形，请使用[a-fA-F0-9] Console.Write(data.ToString()) 写入整行，而不是与表达式匹配的文本。实际上，您丢弃了与表达式匹配的文本。你需要使用正则表达式吗？它看起来像一个固定宽度的文件（减去奇怪的多行标题）。我认为您可以摆脱对每一行的子串。 Read fixed-width record from text file 可能重复。要获取Rx之后的数字，您可以使用Regex.Matches(s, @"\bRx\s+(\d+)").Cast<Match>().Select(x => x.Groups[1].Value) 【参考方案1】：

我的猜测是，在这里，我们可能只想在 char 类中添加一个捕获组，可能类似于：

([A-Z0-9]4)

RegEx Demo

测试

using System;
using System.Text.RegularExpressions;

public class Example

    public static void Main()
    
        string pattern = @"([A-Z0-9]4)";
        string input = @" ;   Message Number
    ;   |         Time Offset (ms)
    ;   |         |        Type
    ;   |         |        |        ID (hex)
    ;   |         |        |        |     Data Length
    ;   |         |        |        |     |   Data Bytes (hex) ...
    ;   |         |        |        |     |   |
    ;---+--   ----+----  --+--  ----+---  +  -+ -- -- -- -- -- -- --
         1)         2.0  Rx         0400  8  01 5A 01 57 01 D2 A6 02 
         2)         8.6  Rx         0500  8  02 C1 02 C9 02 BE 02 C2 
         3)        36.2  Rx         0401  8  01 58 01 59 01 01 01 01 
         4)        41.7  Rx         01C4  8  27 9C 64 8C 00 03 E8 08 
         5)        43.1  Rx         0501  8  02 C0 02 C1 02 C6 02 C0 
         6)        62.7  Rx         01C2  8  27 9C 60 90 00 0F 04 08 ";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        
            Console.WriteLine("'0' found at index 1.", m.Value, m.Index);

C# Demo

【讨论】：

谢谢！这有帮助。我最终得到的代码略有不同，但它给了我需要的列表。如果我希望列表中的 m.value 与另一个列表结合，那么最好的方法是什么？如果你不介意我问@Emma 请记住，只要消息编号少于 4 位数，此方法就有效。如果 Message number 可以达到 4 位或更多，则可能需要使用 @"Rx *([A-Z0-9]4)" 再使用 m.Value.Substring(m.Value.Length -4) 来获取你想要的实际结果。 @Terry 不需要子串，值被捕获，从而访问 Group[1]。有关问题，请参阅my comment。 @WiktorStribiżew 谢谢，我错过了。

以上是关于使用正则表达式从文本文件中解析 [重复]的主要内容，如果未能解决你的问题，请参考以下文章