C# 正则表达式的替换超出了预期

Posted

技术标签:

【中文标题】C# 正则表达式的替换超出了预期【英文标题】:C# Regex is replacing more than expected 【发布时间】:2021-11-01 02:45:12 【问题描述】:

我在 .NET5 上使用 EntityFrameworkCore v5.0.8

我有一个作为嵌入式资源读取的 SQL 脚本。然而,事实证明,直接执行它是困难的。它是这样读入的。

IF EXISTS (SELECT * FROM sys.objects WHERE [name] = N'MyTrigger' AND [type] = 'TR')\r\nBEGIN\r\n      DROP TRIGGER [dbo].[MyTrigger];\r\nEND;\r\nGO\r\n\r\nCREATE TRIGGER [dbo].[MyTrigger] \r\n   ON  [dbo].[MyTable]\r\n   AFTER INSERT\r\nAS \r\nBEGIN\r\n\t--DECLARE @Cinfo VARBINARY(128) \r\n\t--SELECT @Cinfo = Context_Info() \r\n\t--IF @Cinfo = 0x55555 \r\n\t--RETURN \r\n\t-- SET NOCOUNT ON added to prevent extra result sets from\r\n\t-- interfering with SELECT statements.\r\n\tSET NOCOUNT ON;\r\n\tDECLARE @O AS INT\r\n\tDECLARE @D AS INT\r\n\tDECLARE @A AS REAL\r\n\tDECLARE @N AS REAL\r\n\tSET @O = (SELECT TOP 1 OtherId FROM inserted)\r\n\t ... (Removed for breveity) GO

读取和运行它的 c# 代码看起来像,

List<string> resourceNames = new List<string> "MyTrigger.sql" ;
        
try 
    context.Database.OpenConnection();
    resourceNames.ForEach(resourceName => 
        string script = ReadEmbeddedResource(resourceName);
        context.Database.ExecuteSqlRaw(script); // Exception thrown here
    );
 finally 
    context.Database.CloseConnection();

然而,这会产生以下错误,

Exception has occurred: CLR/Microsoft.Data.SqlClient.SqlException
Exception thrown: 'Microsoft.Data.SqlClient.SqlException' in Microsoft.EntityFrameworkCore.Relational.dll: 'Incorrect syntax near 'GO'.
Incorrect syntax near 'GO'.
Incorrect syntax near 'GO'.
'CREATE TRIGGER' must be the first statement in a query batch.
Incorrect syntax near 'GO'.
Incorrect syntax near 'GO'.'
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean isAsync, Int32 timeout, Boolean asyncWrite)
   at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQuery(RelationalCommandParameterObject parameterObject)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlRaw(DatabaseFacade databaseFacade, String sql, IEnumerable`1 parameters)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlRaw(DatabaseFacade databaseFacade, String sql, Object[] parameters)
   at MyLib.MyService.<>c__DisplayClass30_0.<MigrateScripts>b__0(String resourceName) in

我假设这与脚本的执行方式有关,所以我的攻击计划如下,

    删除所有 cmets 用空格替换所有新行、回车和制表符 从“GO”中拆分脚本 将 SQL 脚本的每个单独部分作为单独的事务运行

我为此编写的代码如下,

List<string> resourceNames = new List<string> "MyTrigger.sql" ;

DbContext context = GetDbContext(contextType);
try 
    context.Database.OpenConnection();
    resourceNames.ForEach(resourceName => 
        string script = ReadEmbeddedResource(resourceName);
        // remove comments 
        script = Regex.Replace(script, "[-][-][^\\\\]*\\r\\n", " "); // <- This line is removing too much
        // remove new lines and tabs 
        script = Regex.Replace(script, "\\.", " ");
        // split by GOs and run each section independently
        List<string> scriptParts = Regex.Split(script, "GO", RegexOptions.IgnoreCase).ToList();
        scriptParts.ForEach(scriptPart => context.Database.ExecuteSqlRaw(scriptPart));
        
    );
 finally 
    context.Database.CloseConnection();

然而,在运行script = Regex.Replace(script, "[-][-][^\\\\]*\\r\\n", " "); 之后,脚本看起来像,

IF EXISTS (SELECT * FROM sys.objects WHERE [name] = N'MyTrigger' AND [type] = 'TR')\r\nBEGIN\r\n      DROP TRIGGER [dbo].[MyTrigger];\r\nEND;\r\nGO\r\n\r\nCREATE TRIGGER [dbo].[MyTrigger] \r\n   ON  [dbo].[MyTable]\r\n   AFTER INSERT\r\nAS \r\nBEGIN\r\n\t GO

I.E.它在创建触发器脚本中的第一条评论之后删除了所有内容。我希望它只删除 cmets 而没有别的。脚本中的每条注释都以“--”开头,以“\r\n”结尾。

为此的正则表达式基于--[^\\]*\\r\\n,我在VSCode 的查找和替换以及此站点:https://www.regextester.com/ 中对其进行了测试。它似乎在两者中都能正常工作。

所以我的问题是为什么 C# 中的这个正则表达式替换的比我预期的要多,我如何更新它以获得我正在寻找的结果?

【问题讨论】:

【参考方案1】:

我的进攻计划如下,

    删除所有 cmets 用空格替换所有新行、回车和制表符 从“GO”中拆分脚本

我会开始拆分GO。 C# SqlCommand 不关心 cmets、空格等,并且 SQLS 中的解析器无论如何都会丢弃它们,所以你的 1. 和 2. 正在无缘无故地工作.. 这会执行得很好:

var c = new SqlCommand(@"/* Rhett wrote it */
SELECT name --just need name for now
FROM   t");

GO 然而是一个“sql server management studio”的东西,用于将脚本分成批处理,不应该发送到 sql server


关于您的问题,您可能误解了* 的工作原理。它是贪婪的,所以它会消耗整个输入,然后如果匹配不成功则将字符吐出

这意味着

the.*o

使用时

the quick brown fox jumps over the lazy dog, yeah!

匹配这个:

the quick brown fox jumps over the lazy dog, yeah!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

不是这个:

the quick brown fox jumps over the lazy dog, yeah!
^^^^^^^^^^^^^

因为它找到“the”,然后消耗整个字符串,然后吐出来寻找匹配:

the quick brown fox jumps over the lazy dog, yeah! - NO MATCH, ends in !, looking for o
the quick brown fox jumps over the lazy dog, yeah  - NO MATCH, ends in h, looking for o
the quick brown fox jumps over the lazy dog, yea   - NO MATCH, ends in a, looking for o
the quick brown fox jumps over the lazy dog, ye    ...
the quick brown fox jumps over the lazy dog, y
the quick brown fox jumps over the lazy dog, 
the quick brown fox jumps over the lazy dog,
the quick brown fox jumps over the lazy dog        - NO MATCH, ends in g, looking for o
the quick brown fox jumps over the lazy do         - MATCH

如果您希望正则表达式向前爬行以寻找第一次匹配的机会,请将? 放在* 之后

the.*?o

匹配:

the quick brown fox jumps over the lazy dog, yeah!
^^^^^^^^^^^^^

此外,对于简单的浴室阅读,请查看 MultiLine 和 SingleLine 正则表达式选项,它们会影响 . 处理换行符的方式,以及 $/^ 将什么视为输入的开始/结束

【讨论】:

以上是关于C# 正则表达式的替换超出了预期的主要内容,如果未能解决你的问题,请参考以下文章

C#正则表达式指定替换

C# - 英国邮政编码正则表达式没有按预期工作?

正则表达式在 C# 中按组替换

使用 Vim 的正则表达式替换多个文件的单词在 sed 中无法按预期工作

如何使用 C# 和正则表达式删除引号 (") 内的所有逗号

C# 正则表达式 Replace的功能