如何在 C# 中读取正则表达式捕获

Posted 2023-04-13

技术标签:

【中文标题】如何在 C# 中读取正则表达式捕获【英文标题】：How to read RegEx Captures in C# 【发布时间】：2021-11-11 06:27:55 【问题描述】：

我开始编写一本 C# 书籍，并决定将 RegEx 加入其中，让枯燥的控制台练习变得更有趣。我想要做的是在控制台中向用户询问他们的电话号码，对照正则表达式进行检查，然后捕获数字，以便我可以按照我想要的方式对其进行格式化。除了 RegEx 捕获部分之外，我已经完成了所有工作。如何将捕获值放入 C# 变量中？

还可以随时更正任何代码格式或变量命名问题。

static void askPhoneNumber()

    String pattern = @"[(]?(\d3)[)]?[ -.]?(\d3)[ -.]?(\d4)";

    System.Console.WriteLine("What is your phone number?");
    String phoneNumber = Console.ReadLine();

    while (!Regex.IsMatch(phoneNumber, pattern))
    
        Console.WriteLine("Bad Input");
        phoneNumber = Console.ReadLine();
    

    Match match = Regex.Match(phoneNumber, pattern);
    Capture capture = match.Groups.Captures;

    System.Console.WriteLine(capture[1].Value + "-" + capture[2].Value + "-" + capture[3].Value);

【问题讨论】：

【参考方案1】：

C# 正则表达式 API 可能会让人很困惑。有groups和captures：

一个group代表一个捕获组，用于从文本中提取子串如果组出现在量词内，则每个组可以有多个捕获。

层次结构是：

匹配组捕获

（一个match可以有多个group，每个group可以有多个capture）

例如：

Subject: aabcabbc
Pattern: ^(?:(a+b+)c)+$

在此示例中，只有一个组：(a+b+)。该组位于量词内，并匹配两次。它生成两个捕获：aab 和abb：

aabcabbc
^^^ ^^^
Cap1  Cap2

当组不在量词内时，它只会生成一次捕获。在您的情况下，您有 3 个组，每个组捕获一次。您可以使用 match.Groups[1].Value、match.Groups[2].Value 和 match.Groups[3].Value 来提取您感兴趣的 3 个子字符串，而完全无需使用 capture 概念。

【讨论】：

不会是 match.Groups[0].Value 1, 2 因为基于 0 的索引吗？ @CausingUnderflowsEverywhere 索引 0 处的组代表整个匹配。捕获组从索引 1 开始。【参考方案2】：

比赛结果可能很难理解。我编写这段代码是为了帮助我理解发现了什么以及在哪里。其目的是可以将输出的片段（来自标有//** 的行）复制到程序中，以使用匹配中找到的值。

public static void DisplayMatchResults(Match match)

    Console.WriteLine("Match has 0 captures", match.Captures.Count);

    int groupNo = 0;
    foreach (Group mm in match.Groups)
    
        Console.WriteLine("  Group 0,2 has 1,2 captures '2'", groupNo, mm.Captures.Count, mm.Value);

        int captureNo = 0;
        foreach (Capture cc in mm.Captures)
        
            Console.WriteLine("       Capture 0,2 '1'", captureNo, cc);
            captureNo++;
        
        groupNo++;
    

    groupNo = 0;
    foreach (Group mm in match.Groups)
    
        Console.WriteLine("    match.Groups[0].Value == \"1\"", groupNo, match.Groups[groupNo].Value); //**
        groupNo++;
    

    groupNo = 0;
    foreach (Group mm in match.Groups)
    
        int captureNo = 0;
        foreach (Capture cc in mm.Captures)
        
            Console.WriteLine("    match.Groups[0].Captures[1].Value == \"2\"", groupNo, captureNo, match.Groups[groupNo].Captures[captureNo].Value); //**
            captureNo++;
        
        groupNo++;

一个使用这个方法的简单例子，给定这个输入：

Regex regex = new Regex("/([A-Za-z]+)/(\\d+)$");
String text = "some/directory/Pictures/Houses/12/apple/banana/"
            + "cherry/345/damson/elderberry/fig/678/gooseberry");
Match match = regex.Match(text);
DisplayMatchResults(match);

输出是：

Match has 1 captures
  Group  0 has  1 captures '/Houses/12'
       Capture  0 '/Houses/12'
  Group  1 has  1 captures 'Houses'
       Capture  0 'Houses'
  Group  2 has  1 captures '12'
       Capture  0 '12'
    match.Groups[0].Value == "/Houses/12"
    match.Groups[1].Value == "Houses"
    match.Groups[2].Value == "12"
    match.Groups[0].Captures[0].Value == "/Houses/12"
    match.Groups[1].Captures[0].Value == "Houses"
    match.Groups[2].Captures[0].Value == "12"

假设我们想在上面的文本中找到上面正则表达式的所有匹配项。然后我们可以在代码中使用MatchCollection，例如：

MatchCollection matches = regex.Matches(text);
for (int ii = 0; ii < matches.Count; ii++)

    Console.WriteLine("Match[0]  // of 0..1:", ii, matches.Count-1);
    RegexMatchDisplay.DisplayMatchResults(matches[ii]);

由此产生的输出是：

Match[0]  // of 0..2:
Match has 1 captures
  Group  0 has  1 captures '/Houses/12/'
       Capture  0 '/Houses/12/'
  Group  1 has  1 captures 'Houses'
       Capture  0 'Houses'
  Group  2 has  1 captures '12'
       Capture  0 '12'
    match.Groups[0].Value == "/Houses/12/"
    match.Groups[1].Value == "Houses"
    match.Groups[2].Value == "12"
    match.Groups[0].Captures[0].Value == "/Houses/12/"
    match.Groups[1].Captures[0].Value == "Houses"
    match.Groups[2].Captures[0].Value == "12"
Match[1]  // of 0..2:
Match has 1 captures
  Group  0 has  1 captures '/cherry/345/'
       Capture  0 '/cherry/345/'
  Group  1 has  1 captures 'cherry'
       Capture  0 'cherry'
  Group  2 has  1 captures '345'
       Capture  0 '345'
    match.Groups[0].Value == "/cherry/345/"
    match.Groups[1].Value == "cherry"
    match.Groups[2].Value == "345"
    match.Groups[0].Captures[0].Value == "/cherry/345/"
    match.Groups[1].Captures[0].Value == "cherry"
    match.Groups[2].Captures[0].Value == "345"
Match[2]  // of 0..2:
Match has 1 captures
  Group  0 has  1 captures '/fig/678/'
       Capture  0 '/fig/678/'
  Group  1 has  1 captures 'fig'
       Capture  0 'fig'
  Group  2 has  1 captures '678'
       Capture  0 '678'
    match.Groups[0].Value == "/fig/678/"
    match.Groups[1].Value == "fig"
    match.Groups[2].Value == "678"
    match.Groups[0].Captures[0].Value == "/fig/678/"
    match.Groups[1].Captures[0].Value == "fig"
    match.Groups[2].Captures[0].Value == "678"

因此：

    matches[1].Groups[0].Value == "/cherry/345/"
    matches[1].Groups[1].Value == "cherry"
    matches[1].Groups[2].Value == "345"
    matches[1].Groups[0].Captures[0].Value == "/cherry/345/"
    matches[1].Groups[1].Captures[0].Value == "cherry"
    matches[1].Groups[2].Captures[0].Value == "345"

matches[0] 和 matches[2] 也是如此。

【讨论】：

【参考方案3】：

string pattern = @"[(]?(\d3)[)]?[ -.]?(\d3)[ -.]?(\d4)";

System.Console.WriteLine("What is your phone number?");
string phoneNumber = Console.ReadLine();

while (!Regex.IsMatch(phoneNumber, pattern))

    Console.WriteLine("Bad Input");
    phoneNumber = Console.ReadLine();


var match = Regex.Match(phoneNumber, pattern);
if (match.Groups.Count == 4)

    System.Console.WriteLine("Number matched : "+match.Groups[0].Value);
    System.Console.WriteLine(match.Groups[1].Value + "-" + match.Groups[2].Value + "-" + match.Groups[3].Value);

【讨论】：

以上是关于如何在 C# 中读取正则表达式捕获的主要内容，如果未能解决你的问题，请参考以下文章