.Net 正则表达式用捕获组替换重复出现的模式

Posted

技术标签:

【中文标题】.Net 正则表达式用捕获组替换重复出现的模式【英文标题】:.Net Regex to replace duplicate occurrences of a pattern with capture group 【发布时间】:2017-05-13 19:59:40 【问题描述】:

我有一个类似这样的 SQL 脚本:

DECLARE @MyVariable1 = 1
DECLARE @MyVariable1 = 10
DECLARE @MyVariable3 = 15
DECLARE @MyVariable2 = 20
DECLARE @MyVariable1 = 7
DECLARE @MyVariable2 = 4
DECLARE @MyVariable4 = 7
DECLARE @MyVariable2 = 4

当然,真正的脚本中间还有很多其他的东西,但我想写一个给定上述输入的函数,输出如下:

DECLARE @MyVariable1 = 1
@MyVariable1 = 10
DECLARE @MyVariable3 = 15
DECLARE @MyVariable2 = 20
@MyVariable1 = 7
@MyVariable2 = 4
DECLARE @MyVariable4 = 7
@MyVariable2 = 4

基本上删除了已声明变量的重复 DECLARE 语句

我目前的解决方案是这样的:

    Private Function RemoveDuplicateDeclarations(commandText As String) As String
        Dim lines = commandText.Split(New String()  vbCrLf , StringSplitOptions.RemoveEmptyEntries)
        Dim declarationRegex As New Regex("(\r|\n|\r\n) *DECLARE *(?<initialization>(?<varname>[^ ]*) *.*)" & vbCrLf , RegexOptions.Multiline Or RegexOptions.IgnoreCase)
        Dim declaredVariables As New List(Of String) 
        Dim resultBuilder As New StringBuilder()

        For Each line In lines    
            Dim matches = declarationRegex.Matches(line)
            If matches.Count > 0 Then
                Dim varname = matches(0).Groups("varname").Value
                If declaredVariables.Contains(varname) Then
                    resultBuilder.AppendLine(declarationRegex.Replace(line, "$initialization"))
                Else 
                    declaredVariables.Add(varname)

                    resultBuilder.AppendLine(line)
                End If
            Else
                resultBuilder.AppendLine(line)
            End If
        Next

        Return resultBuilder.ToString()
    End Function

它非常适合我的脚本(并且不会有任何新脚本),但它似乎有点过于复杂,因为我可以匹配我想要替换的内容我想知道是否有办法只需使用一些参数运行Regex.Replace() 并在一行中完成它

欢迎使用 C# 解决方案

-编辑-

为了澄清我想要实现的目标,我想要以下格式的答案,或者解释这是不可能的(允许修改正则表达式)。

Private Function RemoveDuplicateDeclarations(commandText As String) As String
    Dim regex As New Regex("(\r|\n|\r\n) *DECLARE *(?<initialization>(?<varname>[^ ]*) *.*)" & vbCrLf , RegexOptions.Multiline Or RegexOptions.IgnoreCase)
    Return regex.Replace(commandText, "What do I put here???????")
End Function

【问题讨论】:

您使用 C# 和 VB.NET 标记了您的问题,但您的代码在 VB.NET 中。是不是意味着C#标签可以去掉? 可以,我标记了 c#,因为我不介意是否有人想在 c# 中回答,因为正则表达式在两种环境中的工作方式相同 【参考方案1】:

您可以使用一个相当简单的正则表达式来搜索行中重复的 @ 前缀词,并且只将第一次出现在循环中,直到没有匹配。

(?sm)(^DECLARE\s+(@\w+\b).*?)^DECLARE\s+\2

详情

(?sm) - 启用 MULTILINE 和 Singleline (DOTALL) 模式 (^DECLARE\s+(@\w+\b).*?) - 第 1 组捕获: ^DECLARE - DECLARE 在行首 \s+ - 1 个或多个空格符号 (@\w+\b) - 第 2 组捕获 @ 和 1+ 个单词字符,直到尾随单词边界 .*? - 任何 0+ 个字符,尽可能少,直到第一次出现... ^DECLARE - 一行开头的 DECLARE 子字符串 \s+ - 1+ 个空格 \2 - 对存储在第 2 组中的值的反向引用

见VB.NET demo:

Dim rx As Regex = New Regex("(?sm)(^DECLARE\s+(@\w+\b).*?)^DECLARE\s+\2")
Dim s As String = "DECLARE @MyVariable1 = 1" & vbCrLf & "DECLARE @MyVariable1 = 10" & vbCrLf & "DECLARE @MyVariable3 = 15" & vbCrLf & "DECLARE @MyVariable2 = 20" & vbCrLf & "DECLARE @MyVariable1 = 7" & vbCrLf & "DECLARE @MyVariable2 = 4" & vbCrLf & "DECLARE @MyVariable4 = 7" & vbCrLf & "DECLARE @MyVariable2 = 4"
Dim res As String
Dim tmp As String = s
res = rx.Replace(s, "$1$2")
While (String.Compare(tmp, res) <> 0)
    tmp = res
    res = rx.Replace(res, "$1$2")
End While
Console.WriteLine(res)

输出:

DECLARE @MyVariable1 = 1
@MyVariable1 = 10
DECLARE @MyVariable3 = 15
DECLARE @MyVariable2 = 20
@MyVariable1 = 7
@MyVariable2 = 4
DECLARE @MyVariable4 = 7
@MyVariable2 = 4

【讨论】:

【参考方案2】:

如果您喜欢 linq 解决方案:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace ConsoleApplication1

    class Program
    
        static void Main(string[] args)
        
            string input =
                "DECLARE @MyVariable1 = 1\n" +
                "DECLARE @MyVariable1 = 10\n" +
                "DECLARE @MyVariable3 = 15\n" +
                "DECLARE @MyVariable2 = 20\n" +
                "DECLARE @MyVariable1 = 7\n" +
                "DECLARE @MyVariable2 = 4\n" +
                "DECLARE @MyVariable4 = 7\n" +
                "DECLARE @MyVariable2 = 4\n";

            string pattern = @"@(?'name'[^\s]+)\s+=\s+(?'value'\d+)";

            MatchCollection matches = Regex.Matches(input, pattern);

            string[] lines = matches.Cast<Match>()
                .Select((x, i) => new  name = x.Groups["name"].Value, value = x.Groups["value"].Value, index = i )
                .GroupBy(x => x.name)
                .Select(x => x.Select((y, i) =>  new  
                    index = y.index,  
                    s = i == 0 
                       ? string.Format("DECLARE @0 = 1", x.Key, y.value)  
                       : string.Format("@0 = 1", x.Key, y.value) 
                ))
                .SelectMany(x => x)
                .OrderBy(x => x.index)
                .Select(x => x.s)
                .ToArray();

            foreach (string line in lines)
            
                Console.WriteLine(line);
            
            Console.ReadLine();

        
    

【讨论】:

以上是关于.Net 正则表达式用捕获组替换重复出现的模式的主要内容,如果未能解决你的问题,请参考以下文章

正则表达式替换重复捕获

如何用元组列表替换列表中正则表达式匹配的模式?

正则表达式捕获多个重复模式模式

Ruby 用捕获的正则表达式模式替换字符串

正则表达式 [REGEX] - 替换/替换 - 捕获组 1 和 2 中的内容

正则表达式中的子组模式