从字符串中获取随机的单词序列?

Posted

技术标签:

【中文标题】从字符串中获取随机的单词序列?【英文标题】:Grab random sequence of words from a string? 【发布时间】:2015-12-30 03:20:43 【问题描述】:

我正在寻找从字符串中获取一定数量的单词(按顺序)的最有效方法。

假设我有一段文字:

“Lorem Ipsum 只是印刷和排版行业的虚拟文本。自 1500 年代以来,Lorem Ipsum 一直是该行业的标准虚拟文本,当时一位不知名的印刷商采用了一种类型的厨房并将其加扰以制作类型样本书。它不仅经历了五个世纪,而且还经历了电子排版的飞跃,基本保持不变。它在 1960 年代随着包含 Lorem Ipsum 段落的 Letraset 表的发布而流行起来,最近随着桌面排版软件如 Aldus PageMaker 的发布而普及,包括版本Lorem Ipsum。”

我希望能够在段落中的随机位置抓取可变数量的单词。因此,如果需要 5 个单词,一些输出的示例可能是:

“发布 Letraset 表包含” “Lorem Ipsum 简直是假的” “只有五个世纪,还要”

最好的方法是什么?

【问题讨论】:

【参考方案1】:

用空格分割数据得到一个单词列表,然后找一个随机的地方从中选择单词(至少从末尾算起 5 个单词),然后将这些单词重新组合在一起。

private static readonly Random random = new Random();
public static void Main(string[] args)

    var data =
        "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
    Console.WriteLine(GetRandomWords(data, 5));
    Console.ReadLine();


private static string GetRandomWords(string data, int x)

    // Split data into words.
    var words = data.Split(' ');
    // Find a random place to start, at least x words back.
    var start = random.Next(0, words.Length - x);
    // Select the words.
    var selectedWords = words.Skip(start).Take(x);
    return string.Join(" ", selectedWords);

示例输出:

the 1960s with the release
PageMaker including versions of Lorem
since the 1500s, when an
leap into electronic typesetting, remaining
typesetting, remaining essentially unchanged. It

【讨论】:

这正是我想到的解决方案。干得好。【参考方案2】:

对于顺序变化,我会这样做:

    将它们放在Array 的单词中split(' ')Random生成一个从0到长度为Array减5的随机值 把它们放在一个句子中,给一些空格。

VB版本+测试结果

(这可能是你更感兴趣的)

Imports System
Imports System.Text

Public Module Module1
    Public Sub Main()
        Dim str As String = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
        Console.WriteLine(GrabRandSequence(str))
        Console.WriteLine(GrabRandSequence(str))
        Console.WriteLine(GrabRandSequence(str))
        Console.ReadKey()
    End Sub

    Public Function GrabRandSequence(inputstr As String)
        Try
            Dim words As String() = inputstr.Split(New Char() " "c)
            Dim index As Integer
            index = CInt(Math.Floor((words.Length - 5) * Rnd()))
            Return [String].Join(" ", words, index, 5)

        Catch e As Exception
            Return e.ToString()
        End Try
    End Function    
End Module

结果

C# 版本

string[] words = input.Split(' '); //Read 1.
int val = (new Random()).Next(0, words.Length - 5); //Read 2.
string result = string.Join(" ", words, val, 5); //Read 3. improved by Enigmativy's suggestion

额外尝试

对于随机变化,我会这样做:

    清除所有不必要的字符(.等) 将它们放入List LINQ split(' ') 通过LINQ在其中选择Distinct(可选,避免出现Lorem Lorem Lorem Lorem Lorem这样的结果) 生成 5 个不同的随机值,从 0 到大小为 ListRandom(在不明显时重复选择) 根据List中的随机值选择单词 把它们放在一个句子中,给一些空格。

警告:这句话可能根本没有任何意义!!


C# 版本(仅限)

string input = "the input sentence, blabla";
input = input.Replace(",","").Replace(".",""); //Read 1. add as many replace as you want
List<string> words = input.Split(' ').Distinct.ToList(); //Read 2. and 3.
Random rand = new Random(); 
List<int> vals = new List<int>();

do  //Read 4.
    int val = rand.Next(0, words.Count);
    if (!vals.Contains(val))
        vals.Add(val);
 while (vals.Count < 5);

string result = "";
for (int i = 0; i < 5; ++i)
    result += words[vals[i]] + (i == 4 ? "" : " "); //read 5. and 6.

你的结果在result

【讨论】:

不需要删除多余的字符。任务是从源文本中返回连续的单词,而不是 5 个不同的随机单词。考虑到result = String.Join(" ", words); 就足够了,用+ (i == 4 ? "" : " ") 连接字符串很奇怪。【参考方案3】:
string sentense = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
            string[] wordCollections = sentense.Split(' ');
            Random rnd = new Random();
            int randomPos=rnd.Next(0, wordCollections.Length);
            string grabAttempt1 = String.Join(" ", wordCollections.ToArray(), randomPos, 5);
// Gives you a random string of 5 words             
            randomPos = rnd.Next(0, wordCollections.Length);
            string grabAttempt2 = String.Join(" ", wordCollections, randomPos, 5);
// Gives you another random string of 5 words

【讨论】:

【参考方案4】:
        string input = "Your long sentence here";
        int noOfWords = 5;

        string[] arr = input.Split(' ');

        Random rnd = new Random();
        int start = rnd.Next(0, arr.Length - noOfWords);

        string output = "";
        for(int i = start; i < start + noOfWords; i++)
            output += arr[i] + " ";

        Console.WriteLine(output);

【讨论】:

不要在rnd.Next(...) 上使用- 1,因为上限是独占的。【参考方案5】:

这可能对你有用

    private void pickRandom()
    
        string somestr = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
        string[] newinp = somestr.Split(' ');
        Random rnd = new Random();
        int strtindex = rnd.Next(0, newinp.Length - 5);
        string fivewordString = String.Join(" ", newinp.Skip(strtindex).Take(5).ToArray());
    

【讨论】:

不是- 6 - 它应该是- 5,因为rnd.Next 中的最大参数是独占的。 哦,是的@Enigmativity,谢谢

以上是关于从字符串中获取随机的单词序列?的主要内容,如果未能解决你的问题,请参考以下文章

如何从随机字符串中删除单词“ BALLOON”?

jquery从textarea中获取随机单词

Leetcode——通过删除字母匹配到字典里最长单词(子序列)

用NLTK/Python生成一串N个随机英文单词

用一组单词数组中的随机单词替换字符串中的整个单词

在特定单词之后从字符串中获取子字符串