如何分割字符串保留整个单词?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何分割字符串保留整个单词?相关的知识,希望对你有一定的参考价值。
我需要将长句分成保留整个单词的部分。每个部分应该给出最大数量的字符(包括空格,点等)。例如:
int partLenght = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon."
输出:
1 part: "Silver badges are awarded for"
2 part: "longer term goals. Silver badges are"
3 part: "uncommon."
试试这个:
static void Main(string[] args)
{
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] words = sentence.Split(' ');
var parts = new Dictionary<int, string>();
string part = string.Empty;
int partCounter = 0;
foreach (var word in words)
{
if (part.Length + word.Length < partLength)
{
part += string.IsNullOrEmpty(part) ? word : " " + word;
}
else
{
parts.Add(partCounter, part);
part = word;
partCounter++;
}
}
parts.Add(partCounter, part);
foreach (var item in parts)
{
Console.WriteLine("Part {0} (length = {2}): {1}", item.Key, item.Value, item.Value.Length);
}
Console.ReadLine();
}
我知道必须有一个很好的LINQ-y方式来做这个,所以这里是为了它的乐趣:
var input = "The quick brown fox jumps over the lazy dog.";
var charCount = 0;
var maxLineLength = 11;
var lines = input.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.GroupBy(w => (charCount += w.Length + 1) / maxLineLength)
.Select(g => string.Join(" ", g));
// That's all :)
foreach (var line in lines) {
Console.WriteLine(line);
}
显然,只要查询不是并行的,这个代码就可以工作,因为它依赖于charCount
“以字顺序”递增。
我一直在测试Jon和Lessan的答案,但是如果你的最大长度需要是绝对的而不是近似的,它们就不能正常工作。当它们的计数器递增时,它不计算在一行末尾留下的空白空间。
根据OP的示例运行他们的代码,您得到:
1 part: "Silver badges are awarded for " - 29 Characters
2 part: "longer term goals. Silver badges are" - 36 Characters
3 part: "uncommon. " - 13 Characters
第二行的“是”,应该在第三行。发生这种情况是因为计数器不包括第一行末尾的6个字符。
我想出了以下对Lessan的答案的修改:
public static class ExtensionMethods
{
public static string[] Wrap(this string text, int max)
{
var charCount = 0;
var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return lines.GroupBy(w => (charCount += (((charCount % max) + w.Length + 1 >= max)
? max - (charCount % max) : 0) + w.Length + 1) / max)
.Select(g => string.Join(" ", g.ToArray()))
.ToArray();
}
}
使用(空格)拆分字符串,从结果数组中构建新字符串,在每个新段的限制之前停止。
未经测试的伪代码:
string[] words = sentence.Split(new char[] {' '});
IList<string> sentenceParts = new List<string>();
sentenceParts.Add(string.Empty);
int partCounter = 0;
foreach (var word in words)
{
if(sentenceParts[partCounter].Length + word.Length > myLimit)
{
partCounter++;
sentenceParts.Add(string.Empty);
}
sentenceParts[partCounter] += word + " ";
}
起初我以为这可能是一个正则表达式的东西,但这是我的镜头:
List<string> parts = new List<string>();
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");
foreach(var piece in pieces)
{
if(piece.Length + tempString.Length + 1 > partLength)
{
parts.Add(tempString.ToString());
tempString.Clear();
}
tempString.Append(" " + piece);
}
扩展jon的答案;我需要用g
切换g.toArray()
,并将max
更改为(max + 2)
以获得最大字符的精确包装。
public static class ExtensionMethods
{
public static string[] Wrap(this string text, int max)
{
var charCount = 0;
var lines = text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return lines.GroupBy(w => (charCount += w.Length + 1) / (max + 2))
.Select(g => string.Join(" ", g.ToArray()))
.ToArray();
}
}
以下是NUnit测试的示例用法:
[Test]
public void TestWrap()
{
Assert.AreEqual(2, "A B C".Wrap(4).Length);
Assert.AreEqual(1, "A B C".Wrap(5).Length);
Assert.AreEqual(2, "AA BB CC".Wrap(7).Length);
Assert.AreEqual(1, "AA BB CC".Wrap(8).Length);
Assert.AreEqual(2, "TEST TEST TEST TEST".Wrap(10).Length);
Assert.AreEqual(2, " TEST TEST TEST TEST ".Wrap(10).Length);
Assert.AreEqual("TEST TEST", " TEST TEST TEST TEST ".Wrap(10)[0]);
}
Joel你的代码中有一个小错误,我在这里已经纠正过了:
public static string[] StringSplitWrap(string sentence, int MaxLength)
{
List<string> parts = new List<string>();
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
string[] pieces = sentence.Split(' ');
StringBuilder tempString = new StringBuilder("");
foreach (var piece in pieces)
{
if (piece.Length + tempString.Length + 1 > MaxLength)
{
parts.Add(tempString.ToString());
tempString.Clear();
}
tempString.Append((tempString.Length == 0 ? "" : " ") + piece);
}
if (tempString.Length>0)
parts.Add(tempString.ToString());
return parts.ToArray();
}
这有效:
int partLength = 35;
string sentence = "Silver badges are awarded for longer term goals. Silver badges are uncommon.";
List<string> lines =
sentence
.Split(' ')
.Aggregate(new [] { "" }.ToList(), (a, x) =>
{
var last = a[a.Count - 1];
if ((last + " " + x).Length > partLength)
{
a.Add(x);
}
else
{
a[a.Count - 1] = (last + " " + x).Trim();
}
return a;
});
它给了我:
Silver badges are awarded for longer term goals. Silver badges are uncommon.
虽然CsConsoleFormat†主要用于格式化控制台的文本,但它也支持生成纯文本。
var doc = new Document().AddChildren(
new Div("Silver badges are awarded for longer term goals. Silver badges are uncommon.") {
TextWrap = TextWrapping.WordWrap
}
);
var bounds = new Rect(0, 0, 35, Size.Infinity);
string text = ConsoleRenderer.RenderDocumentToText(doc, new TextRenderTarget(), bounds);
而且,如果你真的需要修剪字符串,如你的问题:
List<string> lines = text.Trim()
.Split(new[] { Environment.NewLine }, StringSplitOptions.None)
.Select(s => s.Trim())
.ToList();
除了空格上的自动换行,您还可以正确处理连字符,零宽度空格,不间断空格等。
†CsConsoleFormat是我开发的。
以上是关于如何分割字符串保留整个单词?的主要内容,如果未能解决你的问题,请参考以下文章
如何创建一个字符串数组来分割一个字符,其中单词用“”分隔? C++