句点、单词和冒号的正则表达式

Posted

技术标签:

【中文标题】句点、单词和冒号的正则表达式【英文标题】:Regex for period, word, and then colon 【发布时间】:2021-05-07 09:36:28 【问题描述】:

是否有正则表达式、python 或 javascript 方法来搜索句点、单词和定义,然后将其附加到字典或其他对象?

例如:

。回归:回归是再次回到以前的状态或条件。修辞:修辞是使用语言说服或影响人们的技能或艺术,尤其是听起来令人印象深刻但可能不真诚或不诚实的语言。

这将变成"Reversion" : "A reversion is turning back again to a previous state or condition", "Rhetoric" : "Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest"

【问题讨论】:

/\. (.*): (.*)\./ 这应该可以工作 我不会为此使用正则表达式,这是一个相当复杂的表达式。请记住:“有些人在遇到问题时会想“我知道,我会使用正则表达式。”现在他们有两个问题。” 【参考方案1】:

我在原句中插入了几句,以表明该方案可以获取文本中任意位置的模式。

我使用 reduce 将 matchAll 的返回值转换为一个对象。

x = `asoidn aiosjdn yaosui noapids poasm daioansoid apaoms d. Reversion: A reversion is turning back again to a previous state or condition. aasd aeoinad oianfds inauireb docn isenm opisd a. Rhetoric: Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest.`;

matches = [...x.matchAll(/(\w+)\s*:\s*([^.:]+.)/g)];

obj = matches.reduce((dict, [_, name, text]) => (dict[name] = text, dict), );

console.log(obj);

【讨论】:

【参考方案2】:

看看这段代码:

my_final_result = 
input_str = ". Reversion: A reversion is turning back again to a previous state or condition. Rhetoric: Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest."
# assuming that periods don't occur in definitions and only separate the particular definition from another
input_list = input_str.split(".")
for definition in input_list:
    definition_list = definition.split(":")
    if len(definition_list) == 2:  # check if definition is correct
        # save our key-value pair to dictionary. strip() deletes some possible spaces around the words
        my_final_result[definition_list[0].strip()] = definition_list[1].strip()
print(my_final_result)

【讨论】:

【参考方案3】:

过滤掉单词的正则表达式和定义是:

\.\s*([^:]*)\s*:\s*([^.]*)

演示:https://regex101.com/r/utgrCb/1/

\.\s* 起始点和可选空格 ([^:]*)“单词”是冒号之前的所有内容 \s*:\s*冒号用可选空格包围 ([^.]*)“定义”是最后一个点之前的所有内容

【讨论】:

【参考方案4】:

我想使用reduce,但最终得到了这个

const str = `Reversion: A reversion is turning back again to a previous state or condition. Rhetoric: Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest.`

const dict = 
const arr = str.split(/[$\.]?(\w+): /g).slice(1)
for (let i=0;i<arr.length-1;i+=2) 
  dict[arr[i]] = arr[i+1].trim()

console.log("dict",dict)

【讨论】:

以上是关于句点、单词和冒号的正则表达式的主要内容,如果未能解决你的问题,请参考以下文章

正则表达式在每个单词周围加上引号,后跟冒号

正则表达式

正则表达式基础知识

正则表达式用引号和冒号分割

正则表达式

除三个句点(省略号)外的字母周围句点的正则表达式