LINQ:选择任何字符串以某个字符开头的行
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了LINQ:选择任何字符串以某个字符开头的行相关的知识,希望对你有一定的参考价值。
我希望从表中提取所有行,其中列(字符串)中至少有一个以指定字符开头的单词。例:
Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'
如果指定的字符是T - >我将提取所有3行如果指定的字符是S - >我只提取第二列...
请帮我
假设你的意思是“以空格分隔的字符序列,或者以”字“开始以空格或空格结尾”,那么你可以拆分分隔符并测试它们的匹配:
var src = new[] {
"this is the first row",
"this is th second row",
"this is the third row"
};
var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));
LINQ Enumerable.Any
方法tests a sequence to see if any element matches,因此您可以将每个字符串拆分成一个单词序列,并查看是否有任何单词以所需字母开头,以补偿大小写。
试试这个:
rows.Where(r => Regex.IsMatch(r, " [Tt]"))
你可以用Tt
替换Ss
(假设你想要大写或小写)。
问题当然是,什么是“字”?
根据您的定义,句子中的字符序列“单词”是否在单词上方?它不是以空间开始,甚至不是空白空间。
单词的定义可以是:
定义wordCharacter:类似于A-Z,a-z。 定义单词: - 字符串开头的非空字符串序列,后跟非字字符 - 或字符串末尾的非空字符串序列,前面是非字字符 - 任何非空序列字符串中的wordCharacters前面和后面跟着非wordcharacter定义单词的开头:单词的第一个字符。
字符串:“一些奇怪的字符:'A',9,äll,B9 C $ X? - 单词:一些,奇怪的字符,A - 不是单词:9,äll,B9,C $ X?
因此,您首先必须准确指定单词的含义,然后才能定义函数。
我会把它写成IEnumerable<string>
的扩展方法。用法看起来与LINQ类似。见Extension Methods Demystified
bool IsWordCharacter(char c) {... TODO: implement your definition of word character}
static IEnumerable<string> SplitIntoWords(this string text)
{
// TODO: exception if text null
if (text.Length == 0) return
int startIndex = 0;
while (startIndex != text.Length)
{ // not at end of string. Find the beginning of the next word:
while (startIndex < text.Length && !IsWordCharacter(text[startIndex]))
{
++startIndex;
}
// now startIndex points to the first character of the next word
// or to the end of the text
if (startIndex != text.Length)
{ // found the beginning of a word.
// the first character after the word is either the first non-word character,
// or the end of the string
int indexAfterWord = startWordIndex + 1;
while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
{
++indexAfterWord;
}
// all characters from startIndex to indexAfterWord-1 are word characters
// so all characters between startIndexWord and indexAfterWord-1 are a word
int wordLength = indexAfterWord - startIndexWord;
yield return text.SubString(startIndexWord, wordLength);
}
}
}
现在您已经有了将任何字符串拆分为单词定义的过程,您的查询将很简单:
IEnumerabl<string> texts = ...
char specifiedChar = 'T';
// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
// split the text into words
// keep only the words that start with specifiedChar
// if there is such a word: keep the text
.Where(text => text.SplitIntoWords()
.Where(word => word.Length > 0 && word[0] == specifiedChar)
.Any());
以上是关于LINQ:选择任何字符串以某个字符开头的行的主要内容,如果未能解决你的问题,请参考以下文章