LINQ:To Objects - 如何操作字符串
Posted 反骨仔
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了LINQ:To Objects - 如何操作字符串相关的知识,希望对你有一定的参考价值。
Linq To Objects - 如何操作字符串
开篇语:
上次发布的 《LINQ:进阶 - LINQ 标准查询操作概述》(90+赞) 社会反响不错,但自己却始终觉得缺点什么!“纸上得来终觉浅,绝知此事要躬行”,没错,就是实战!这次让我们一起来看看一些操作字符串的技巧,也许能引我们从不同的角度思考问题,从而走出思维的死角!
序
LINQ 可用于查询和转换字符串和字符串集合。它对文本文件中的半结构化数据尤其有用。LINQ 查询可与传统的字符串函数和正则表达式结合使用。
查询文本块
查询文本格式的半结构化数据
一、如何统计单词在字符串出现次数
1 const string text = @"Historically, the world of data and the world of objects" + 2 @" have not been well integrated. Programmers work in C# or Visual Basic" + 3 @" and also in SQL or XQuery. On the one side are concepts such as classes," + 4 @" objects, fields, inheritance, and .NET Framework APIs. On the other side" + 5 @" are tables, columns, rows, nodes, and separate languages for dealing with" + 6 @" them. Data types often require translation between the two worlds; there are" + 7 @" different standard functions. Because the object world has no notion of query, a" + 8 @" query can only be represented as a string without compile-time type checking or" + 9 @" IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to" + 10 @" objects in memory is often tedious and error-prone."; 11 12 const string searchTerm = "data"; 13 14 //字符串转换成数组 15 var source = text.Split(new[] { \'.\', \'?\', \'!\', \' \', \';\', \':\', \',\' }, StringSplitOptions.RemoveEmptyEntries); 16 17 //创建查询,并忽略大小写比较 18 var matchQuery = from word in source 19 where string.Equals(word, searchTerm, StringComparison.InvariantCultureIgnoreCase) 20 select word; 21 22 //统计匹配数量 23 var wordCount = matchQuery.Count(); 24 Console.WriteLine($"{wordCount} occurrences(s) of the search term \\"{searchTerm}\\" were found.");
1 const string text = @"Historically, the world of data and the world of objects " + 2 @"have not been well integrated. Programmers work in C# or Visual Basic " + 3 @"and also in SQL or XQuery. On the one side are concepts such as classes, " + 4 @"objects, fields, inheritance, and .NET Framework APIs. On the other side " + 5 @"are tables, columns, rows, nodes, and separate languages for dealing with " + 6 @"them. Data types often require translation between the two worlds; there are " + 7 @"different standard functions. Because the object world has no notion of query, a " + 8 @"query can only be represented as a string without compile-time type checking or " + 9 @"IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " + 10 @"objects in memory is often tedious and error-prone."; 11 12 //将文本块切割成数组 13 var sentences = text.Split(\'.\', \'?\', \'!\'); 14 15 //定义搜索条件,此列表可以运行时动态添加 16 string[] wordsToMatch = { "Historically", "data", "integrated" }; 17 18 var match = from sentence in sentences 19 let t = 20 sentence.Split(new char[] { \'.\', \'?\', \'!\', \' \', \';\', \':\', \',\' }, StringSplitOptions.RemoveEmptyEntries) 21 where t.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Length //去重,取交集后的数量对比 22 select sentence; 23 24 foreach (var s in match) 25 { 26 Console.WriteLine(s); 27 }
![](https://image.cha138.com/20210611/3a999e895d9541af947410dbf5be074f.jpg)
查询运行时首先将文本拆分成句子,然后将句子拆分成包含每个单词的字符串数组。对于每个这样的数组,Distinct<TSource> 方法移除所有重复的单词,然后查询对单词数组和 wordstoMatch 数组执行 Intersect<TSource> 操作。如果交集的计数与 wordsToMatch 数组的计数相同,则在单词中找到了所有的单词,且返回原始句子。
三、如何在字符串中查询字符
因为 String 类实现泛型 IEnumerable<T> 接口,所以可以将任何字符串作为字符序列进行查询。但是,这不是 LINQ 的常见用法。若要执行复杂的模式匹配操作,请使用 Regex 类。
下面的示例查询一个字符串以确定它包含的数字的数目。
1 const string aString = "ABCDE99F-J74-12-89A"; 2 3 //只选择数字的字符 4 var digits = from ch in aString 5 where char.IsDigit(ch) 6 select ch; 7 8 Console.Write("digit: "); 9 10 foreach (var n in digits) 11 { 12 Console.Write($"{n} "); 13 } 14 15 Console.WriteLine(); 16 17 //选择第一个“-”之前的所有字符 18 var query = aString.TakeWhile(x => x != \'-\'); 19 20 foreach (var ch in query) 21 { 22 Console.Write(ch); 23 }
四、如何用正则表达式结合 LINQ 查询
此示例演示如何使用 Regex 类创建正则表达式以便在文本字符串中进行更复杂的匹配。使用 LINQ 查询可以方便地对您要用正则表达式搜索的文件进行准确筛选,以及对结果进行加工。
1 //根据不同版本的 vs 修改路径 2 const string floder = @"C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\"; 3 var infoes = GetFiles(floder); 4 //创建正则表达式来寻找所有的"Visual" 5 var searchTerm = new Regex(@"Visual (Basic|C#|C\\+\\+|J#|SourceSafe|Studio)"); 6 7 //搜索每一个“.html”文件 8 //通过 where 找到匹配项 9 //【注意】select 中的变量要求显示声明其类型,因为 MatchCollection 不是泛型 IEnumerable 集合 10 var query = from fileInfo in infoes 11 where fileInfo.Extension == ".html" 12 let text = File.ReadAllText(fileInfo.FullName) 13 let matches = searchTerm.Matches(text) 14 where matches.Count > 0 15 select new 16 { 17 name = fileInfo.FullName, 18 matchValue = from Match match in matches select match.Value 19 }; 20 21 Console.WriteLine($"The term \\"{searchTerm}\\" was found in:"); 22 23 foreach (var q in query) 24 { 25 //修剪匹配找到的文件中的路径 26 Console.WriteLine($"{q.name.Substring(floder.Length - 1)}"); 27 28 //输出找到的匹配值 29 foreach (var v in q.matchValue) 30 { 31 Console.WriteLine(v); 32 } 33 }
1 private static IList<FileInfo> GetFiles(string path) 2 { 3 var files = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories); 4 5 return files.Select(file => new FileInfo(file)).ToList(); 6 }
您还可以查询由 RegEx 搜索返回的 MatchCollection 对象。在此示例中,结果中仅生成每个匹配项的值。但也可使用 LINQ 对该集合执行各种筛选、排序和分组操作。
【注意】由于 MatchCollection 是非泛型 IEnumerable 集合,因此必须显式声明查询中的范围变量的类型。
五、如何查找两个集合间的差异
![](https://image.cha138.com/20210611/4353e649473944aca4dbe4dd236c886c.jpg)
Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra
![](https://image.cha138.com/20210611/4353e649473944aca4dbe4dd236c886c.jpg)
Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 //创建数据源 2 var names1Text = File.ReadAllLines(@"names1.txt"); 3 var names2Text = File.ReadAllLines(@"names2.txt"); 4 5 //创建查询,这里必须使用方法语法 6 var query = names1Text.Except(names2Text); 7 8 //执行查询 9 Console.WriteLine("The following lines are in names1.txt but not names2.txt"); 10 foreach (var name in query) 11 { 12 Console.WriteLine(name); 13 }
![](https://image.cha138.com/20210611/c679ab880ac043ab96acf99954c103a9.jpg)
六、如何排序或过滤任意单词或字段的文本数据
![](https://image.cha138.com/20210611/4353e649473944aca4dbe4dd236c886c.jpg)
111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
1 //创建数据源 2 var scores = File.ReadAllLines(@"scores.csv"); 3 //可以改为 0~4 的任意值 4 const int sortField = 1; 5 6 //演示从方法返回查询 7 //返回查询变量,非查询结果 8 //这里执行查询 9 foreach (var score in RunQuery(scores, sortField)) 10 { 11 Console.WriteLine(score); 12 }
1 private static IEnumerable<string> RunQuery(IEnumerable<string> score, int num) 2 { 3 //分割字符串来排序 4 var query = from line in score 5 let fields = line.Split(\',\') 6 orderby fields[num] descending 7 select line; 8 9 return query; 10 }
![](https://image.cha138.com/20210611/ffebf9bf63b14ec4944d72759e639f51.jpg)
此示例还演示如何从方法返回查询变量。
七、如何对一个分割的文件的字段重新排序
逗号分隔值 (CSV) 文件是一种文本文件,通常用于存储电子表格数据或其他由行和列表示的表格数据。通过使用 Split 方法分隔字段,可以非常轻松地使用 LINQ 来查询和操作 CSV 文件。事实上,可以使用此技术来重新排列任何结构化文本行部分;此技术不局限于 CSV 文件。
![](https://image.cha138.com/20210611/4353e649473944aca4dbe4dd236c886c.jpg)
Adams,Terry,120 Fakhouri,Fadi,116 Feng,Hanying,117 Garcia,Cesar,114 Garcia,Debra,115 Garcia,Hugo,118 Mortensen,Sven,113 O\'Donnell,Claire,112 Omelchenko,Svetlana,111 Tucker,Lance,119 Tucker,Michael,122 Zabokritski,Eugene,121
1 //数据源 2 var lines = File.ReadAllLines(@"spreadsheet1.csv"); 3 //将旧数据的第2列的字段放到第一位,逆向结合第0列和第1列的字段 4 var query = from line in lines 5 let t = line.Split(\',\') 6 orderby t[2] 7 select $"{t[2]}, {t[1]} {t[0]}"; 8 9 foreach (var q in query) 10 { 11 Console.WriteLine(q); 12 } 13 14 //写入文件 15 File.WriteAllLines("spreadsheet2.csv", query);
八、如何组合和比较字符串集合
此示例演示如何合并包含文本行的文件,然后排序结果。具体来说,此示例演示如何对两组文本行执行简单的串联、联合和交集。
![](https://image.cha138.com/20210611/4353e649473944aca4dbe4dd236c886c.jpg)
Bankov, Peter
Holm, Michael
Garcia, Hugo
Potra, Cristina
Noriega, Fabricio
Aw, Kam Foo
Beebe, Ann
Toyoshima, Tim
Guy, Wey Yuan
Garcia, Debra
![](https://image.cha138.com/20210611/4353e649473944aca4dbe4dd236c886c.jpg)
Liu, Jinghao
Bankov, Peter
Holm, Michael
Garcia, Hugo
Beebe, Ann
Gilchrist, Beth
Myrcha, Jacek
Giakoumakis, Leo
McLin, Nkenge
El Yassir, Mehdi
1 var names1Text = File.ReadAllLines(@"names1.txt"); 2 var names2Text = File.ReadAllLines(@"names2.txt"); 3 4 //简单连接,并排序。重复保存。 5 var concatQuery = names1Text.Concat(names2Text).OrderBy(x => x); 6 OutputQueryResult(concatQuery, "Simple concatenate and sort. Duplicates are preserved:"); 7 8 //基于默认字符串比较器连接,并删除重名。 9 var unionQuery = names1Text.Union(names2Text).OrderBy(x => x); 10 OutputQueryResult(unionQuery, "Union removes duplicate names:"); 11 12 //查找在两个文件中出现的名称 13 var intersectQuery = names1Text.Intersect(names2Text).OrderBy(x => x); 14 OutputQueryResult(intersectQuery, "Merge based on intersect:"); 15 16 //在每个列表中找到匹配的字段。使用 concat 将两个结果合并,然后使用默认的字符串比较器进行排序 17 const string nameMatch = "Garcia"; 18 var matchQuery1 = from name in names1Text 19 let t = name.Split(\',\') 20 where t[0] == nameMatch 21 select name; 22 var matchQuery2 = from name in names2Text 23 let t = name.Split(\',\') 24 where t[0] == nameMatch 25 select name; 26 27 var temp = matchQuery1.Concat(matchQuery2).OrderBy(x => x); 28 OutputQueryResult(temp, $"Concat based on partial name match \\"{nameMatch}\\":");
1 private 以上是关于LINQ:To Objects - 如何操作字符串的主要内容,如果未能解决你的问题,请参考以下文章