安全截断字符串包含颜色标签
Posted
技术标签:
【中文标题】安全截断字符串包含颜色标签【英文标题】:Safe truncate string contains color tag 【发布时间】:2021-08-16 13:33:18 【问题描述】:我有一个包含颜色标签的字符串。
var myString = "My name is <color=#FF00EE>ABCDE</color> and I love <color=#FFEE00>music</color>";
我的字符串变成“我的名字是 ABCDE*(粉红色)*,我喜欢音乐*(黄色)*”
如果字符串达到最大长度但仍保留颜色标签,我想截断
var myTruncateString = "My name is <color=#FF00EE>ABCDE</color> and I love <color=#FFEE00>mu</color>";
我的字符串变成“我的名字是 ABCDE*(pink)* and I love mu*(yellow)*”
你有什么建议吗?
var stringWithoutFormat = String.Copy(myString);
stringWithoutFormat = Regex.Replace(stringWithoutFormat, "<color.*?>|</color>", "");
var maxLength = 20;
if (stringWithoutFormat.Length > maxLength)
// What should I do next?
【问题讨论】:
那么你到底想要什么?你只是想限制字符数吗?那么原因: int max = 300; var myTruncateString = mystring[..max]; @Foitn 我想截断我的字符串但仍保留颜色标签 没那么容易,有效!我会首先检查字符串长度。如果它太长,那么从末尾搜索任何<color>
标签。如果找到,则截断其内容或在需要时将其完全删除。如果字符串没有以颜色标签结束,那么检查它的结束位置,看看我们是否可以截断它后面的文本,或者我们是否还必须截断它的内容。
您可能必须解码 XML 结构,检查解码标签中值的总长度并在需要的地方截断(最终删除整个标签或其子标签)......顺便说一句,你想要什么获取整个“音乐”字是否超出最大长度?
“如果字符串达到最大长度但仍保留颜色标签,我想截断”最大长度也计入 这是一个相对简单且不是的错误处理示例,我认为您正在尝试完成:
检查最大长度时不要计算颜色标签 从末尾删除字符,不要破坏颜色标签 如果您最终得到的颜色标签之间没有文字,请删除这些标签注意:此代码未经彻底测试。随意使用它来做任何你想做的事情,但我会在这里写很多的单元测试。我特别害怕会导致无限循环的极端情况的存在。
public static string Shorten(string input, int requiredLength)
var tokens = Tokenize(input).ToList();
int current = tokens.Count - 1;
// assumption: color tags doesn't contribute to *visible* length
var totalLength = tokens.Where(t => t.Length == 1).Count();
while (totalLength > requiredLength && current >= 0)
// infinite-loop detection
if (lastCurrent == current && lastTotalLength == totalLength)
throw new InvalidOperationException("Infinite loop detected");
lastCurrent = current;
lastTotalLength = totalLength;
if (tokens[current].Length > 1)
if (current == 0)
return "";
if (tokens[current].StartsWith("</") && tokens[current - 1].StartsWith("<c"))
// Remove a <color></color> pair with no text between
tokens.RemoveAt(current);
tokens.RemoveAt(current - 1);
current -= 2;
// Since color tags doesn't contribute to length, don't adjust totalLength
continue;
// Remove one character from inside the color tags
tokens.RemoveAt(current - 1);
current--;
totalLength--;
else
// Remove last character from string
tokens.RemoveAt(current);
current--;
totalLength--;
// If we're now at the right length, but the last two tokens are <color></color>, remove them
if (tokens.Count >= 2 && tokens.Last().StartsWith("</") && tokens[tokens.Count - 2].StartsWith("<c"))
tokens.RemoveAt(tokens.Count - 1);
tokens.RemoveAt(tokens.Count - 1);
return string.Join("", tokens);
public static IEnumerable<string> Tokenize(string input)
int index = 0;
while (index < input.Length)
if (input[index] == '<')
int endIndex = index;
while (endIndex < input.Length && input[endIndex] != '>')
endIndex++;
if (endIndex < input.Length)
endIndex++;
yield return input.Substring(index, endIndex - index);
index = endIndex;
else
yield return input.Substring(index, 1);
index++;
示例代码:
var myString = "My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>";
for (int length = 1; length < 100; length++)
Console.WriteLine($"length: Shorten(myString, length)");
输出:
1: M
2: My
3: My
4: My n
5: My na
6: My nam
7: My name
8: My name
9: My name i
10: My name is
11: My name is
12: My name is <color=#ff00ee>A</color>
13: My name is <color=#ff00ee>AB</color>
14: My name is <color=#ff00ee>ABC</color>
15: My name is <color=#ff00ee>ABCD</color>
16: My name is <color=#ff00ee>ABCDE</color>
17: My name is <color=#ff00ee>ABCDE</color>
18: My name is <color=#ff00ee>ABCDE</color> a
19: My name is <color=#ff00ee>ABCDE</color> an
20: My name is <color=#ff00ee>ABCDE</color> and
21: My name is <color=#ff00ee>ABCDE</color> and
22: My name is <color=#ff00ee>ABCDE</color> and I
23: My name is <color=#ff00ee>ABCDE</color> and I
24: My name is <color=#ff00ee>ABCDE</color> and I l
25: My name is <color=#ff00ee>ABCDE</color> and I lo
26: My name is <color=#ff00ee>ABCDE</color> and I lov
27: My name is <color=#ff00ee>ABCDE</color> and I love
28: My name is <color=#ff00ee>ABCDE</color> and I love
29: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>m</color>
30: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>mu</color>
31: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>mus</color>
32: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>musi</color>
33: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
34: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
35: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
36: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
37: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
38: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
39: My name is <color=#ff00ee>ABCDE</color> and I love <color=#eeddff>music</color>
... and so on
【讨论】:
【参考方案2】:我生成了 2 个列表:
一个包含真实文本的索引 一个包含标签的开始和结束索引然后我将文本提取到第一个数组中的最大长度。 最后,我检查是否有一个开始标签,如果有,我就关闭它。
注意:我的代码不处理嵌套标签。您必须更改结束标记部分。
public static string Truncate(string text, int maxLength)
if (text.Length <= maxLength) return text;
var tagIndexes = new List<int>();
var realTextIndexes = new List<int>();
bool isInTag = false;
for (int i = 0; i < text.Length; i++)
if (text[i] == '<')
isInTag = true;
tagIndexes.Add(i);
if (!isInTag)
realTextIndexes.Add(i);
if (text[i] == '>')
isInTag = false;
tagIndexes.Add(i);
if (realTextIndexes.Count <= maxLength) return text;
string truncatedText = text.Substring(0, realTextIndexes[maxLength - 1] + 1);
// Should we close a tag ?
for (int i = 0; i < tagIndexes.Count; i++)
if (tagIndexes[i] > realTextIndexes[maxLength - 1])
if ((i % 4) == 2) // If the next tag is a closing tag
truncatedText += text.Substring(tagIndexes[i], tagIndexes[i + 1] - tagIndexes[i] + 1);
break;
return truncatedText;
【讨论】:
以上是关于安全截断字符串包含颜色标签的主要内容,如果未能解决你的问题,请参考以下文章