C# 正则表达式的替换超出了预期
Posted
技术标签:
【中文标题】C# 正则表达式的替换超出了预期【英文标题】:C# Regex is replacing more than expected 【发布时间】:2021-11-01 02:45:12 【问题描述】:我在 .NET5 上使用 EntityFrameworkCore v5.0.8
我有一个作为嵌入式资源读取的 SQL 脚本。然而,事实证明,直接执行它是困难的。它是这样读入的。
IF EXISTS (SELECT * FROM sys.objects WHERE [name] = N'MyTrigger' AND [type] = 'TR')\r\nBEGIN\r\n DROP TRIGGER [dbo].[MyTrigger];\r\nEND;\r\nGO\r\n\r\nCREATE TRIGGER [dbo].[MyTrigger] \r\n ON [dbo].[MyTable]\r\n AFTER INSERT\r\nAS \r\nBEGIN\r\n\t--DECLARE @Cinfo VARBINARY(128) \r\n\t--SELECT @Cinfo = Context_Info() \r\n\t--IF @Cinfo = 0x55555 \r\n\t--RETURN \r\n\t-- SET NOCOUNT ON added to prevent extra result sets from\r\n\t-- interfering with SELECT statements.\r\n\tSET NOCOUNT ON;\r\n\tDECLARE @O AS INT\r\n\tDECLARE @D AS INT\r\n\tDECLARE @A AS REAL\r\n\tDECLARE @N AS REAL\r\n\tSET @O = (SELECT TOP 1 OtherId FROM inserted)\r\n\t ... (Removed for breveity) GO
读取和运行它的 c# 代码看起来像,
List<string> resourceNames = new List<string> "MyTrigger.sql" ;
try
context.Database.OpenConnection();
resourceNames.ForEach(resourceName =>
string script = ReadEmbeddedResource(resourceName);
context.Database.ExecuteSqlRaw(script); // Exception thrown here
);
finally
context.Database.CloseConnection();
然而,这会产生以下错误,
Exception has occurred: CLR/Microsoft.Data.SqlClient.SqlException
Exception thrown: 'Microsoft.Data.SqlClient.SqlException' in Microsoft.EntityFrameworkCore.Relational.dll: 'Incorrect syntax near 'GO'.
Incorrect syntax near 'GO'.
Incorrect syntax near 'GO'.
'CREATE TRIGGER' must be the first statement in a query batch.
Incorrect syntax near 'GO'.
Incorrect syntax near 'GO'.'
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean isAsync, Int32 timeout, Boolean asyncWrite)
at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteNonQuery(RelationalCommandParameterObject parameterObject)
at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlRaw(DatabaseFacade databaseFacade, String sql, IEnumerable`1 parameters)
at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.ExecuteSqlRaw(DatabaseFacade databaseFacade, String sql, Object[] parameters)
at MyLib.MyService.<>c__DisplayClass30_0.<MigrateScripts>b__0(String resourceName) in
我假设这与脚本的执行方式有关,所以我的攻击计划如下,
-
删除所有 cmets
用空格替换所有新行、回车和制表符
从“GO”中拆分脚本
将 SQL 脚本的每个单独部分作为单独的事务运行
我为此编写的代码如下,
List<string> resourceNames = new List<string> "MyTrigger.sql" ;
DbContext context = GetDbContext(contextType);
try
context.Database.OpenConnection();
resourceNames.ForEach(resourceName =>
string script = ReadEmbeddedResource(resourceName);
// remove comments
script = Regex.Replace(script, "[-][-][^\\\\]*\\r\\n", " "); // <- This line is removing too much
// remove new lines and tabs
script = Regex.Replace(script, "\\.", " ");
// split by GOs and run each section independently
List<string> scriptParts = Regex.Split(script, "GO", RegexOptions.IgnoreCase).ToList();
scriptParts.ForEach(scriptPart => context.Database.ExecuteSqlRaw(scriptPart));
);
finally
context.Database.CloseConnection();
然而,在运行script = Regex.Replace(script, "[-][-][^\\\\]*\\r\\n", " ");
之后,脚本看起来像,
IF EXISTS (SELECT * FROM sys.objects WHERE [name] = N'MyTrigger' AND [type] = 'TR')\r\nBEGIN\r\n DROP TRIGGER [dbo].[MyTrigger];\r\nEND;\r\nGO\r\n\r\nCREATE TRIGGER [dbo].[MyTrigger] \r\n ON [dbo].[MyTable]\r\n AFTER INSERT\r\nAS \r\nBEGIN\r\n\t GO
I.E.它在创建触发器脚本中的第一条评论之后删除了所有内容。我希望它只删除 cmets 而没有别的。脚本中的每条注释都以“--”开头,以“\r\n”结尾。
为此的正则表达式基于--[^\\]*\\r\\n
,我在VSCode 的查找和替换以及此站点:https://www.regextester.com/ 中对其进行了测试。它似乎在两者中都能正常工作。
所以我的问题是为什么 C# 中的这个正则表达式替换的比我预期的要多,我如何更新它以获得我正在寻找的结果?
【问题讨论】:
【参考方案1】:我的进攻计划如下,
删除所有 cmets 用空格替换所有新行、回车和制表符 从“GO”中拆分脚本
我会开始拆分GO
。 C# SqlCommand 不关心 cmets、空格等,并且 SQLS 中的解析器无论如何都会丢弃它们,所以你的 1. 和 2. 正在无缘无故地工作.. 这会执行得很好:
var c = new SqlCommand(@"/* Rhett wrote it */
SELECT name --just need name for now
FROM t");
GO
然而是一个“sql server management studio”的东西,用于将脚本分成批处理,不应该发送到 sql server
关于您的问题,您可能误解了*
的工作原理。它是贪婪的,所以它会消耗整个输入,然后如果匹配不成功则将字符吐出
这意味着
the.*o
使用时
the quick brown fox jumps over the lazy dog, yeah!
匹配这个:
the quick brown fox jumps over the lazy dog, yeah!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
不是这个:
the quick brown fox jumps over the lazy dog, yeah!
^^^^^^^^^^^^^
因为它找到“the”,然后消耗整个字符串,然后吐出来寻找匹配:
the quick brown fox jumps over the lazy dog, yeah! - NO MATCH, ends in !, looking for o
the quick brown fox jumps over the lazy dog, yeah - NO MATCH, ends in h, looking for o
the quick brown fox jumps over the lazy dog, yea - NO MATCH, ends in a, looking for o
the quick brown fox jumps over the lazy dog, ye ...
the quick brown fox jumps over the lazy dog, y
the quick brown fox jumps over the lazy dog,
the quick brown fox jumps over the lazy dog,
the quick brown fox jumps over the lazy dog - NO MATCH, ends in g, looking for o
the quick brown fox jumps over the lazy do - MATCH
如果您希望正则表达式向前爬行以寻找第一次匹配的机会,请将?
放在*
之后
the.*?o
匹配:
the quick brown fox jumps over the lazy dog, yeah!
^^^^^^^^^^^^^
此外,对于简单的浴室阅读,请查看 MultiLine 和 SingleLine 正则表达式选项,它们会影响 .
处理换行符的方式,以及 $
/^
将什么视为输入的开始/结束
【讨论】:
以上是关于C# 正则表达式的替换超出了预期的主要内容,如果未能解决你的问题,请参考以下文章