句点、单词和冒号的正则表达式
Posted
技术标签:
【中文标题】句点、单词和冒号的正则表达式【英文标题】:Regex for period, word, and then colon 【发布时间】:2021-05-07 09:36:28 【问题描述】:是否有正则表达式、python 或 javascript 方法来搜索句点、单词和定义,然后将其附加到字典或其他对象?
例如:
。回归:回归是再次回到以前的状态或条件。修辞:修辞是使用语言说服或影响人们的技能或艺术,尤其是听起来令人印象深刻但可能不真诚或不诚实的语言。
这将变成"Reversion" : "A reversion is turning back again to a previous state or condition", "Rhetoric" : "Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest"
【问题讨论】:
/\. (.*): (.*)\./
这应该可以工作
我不会为此使用正则表达式,这是一个相当复杂的表达式。请记住:“有些人在遇到问题时会想“我知道,我会使用正则表达式。”现在他们有两个问题。”
【参考方案1】:
我在原句中插入了几句,以表明该方案可以获取文本中任意位置的模式。
我使用 reduce 将 matchAll 的返回值转换为一个对象。
x = `asoidn aiosjdn yaosui noapids poasm daioansoid apaoms d. Reversion: A reversion is turning back again to a previous state or condition. aasd aeoinad oianfds inauireb docn isenm opisd a. Rhetoric: Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest.`;
matches = [...x.matchAll(/(\w+)\s*:\s*([^.:]+.)/g)];
obj = matches.reduce((dict, [_, name, text]) => (dict[name] = text, dict), );
console.log(obj);
【讨论】:
【参考方案2】:看看这段代码:
my_final_result =
input_str = ". Reversion: A reversion is turning back again to a previous state or condition. Rhetoric: Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest."
# assuming that periods don't occur in definitions and only separate the particular definition from another
input_list = input_str.split(".")
for definition in input_list:
definition_list = definition.split(":")
if len(definition_list) == 2: # check if definition is correct
# save our key-value pair to dictionary. strip() deletes some possible spaces around the words
my_final_result[definition_list[0].strip()] = definition_list[1].strip()
print(my_final_result)
【讨论】:
【参考方案3】:过滤掉单词的正则表达式和定义是:
\.\s*([^:]*)\s*:\s*([^.]*)
演示:https://regex101.com/r/utgrCb/1/
\.\s*
起始点和可选空格
([^:]*)
“单词”是冒号之前的所有内容
\s*:\s*
冒号用可选空格包围
([^.]*)
“定义”是最后一个点之前的所有内容
【讨论】:
【参考方案4】:我想使用reduce,但最终得到了这个
const str = `Reversion: A reversion is turning back again to a previous state or condition. Rhetoric: Rhetoric is the skill or art of using language to persuade or influence people, especially language that sounds impressive but may not be sincere or honest.`
const dict =
const arr = str.split(/[$\.]?(\w+): /g).slice(1)
for (let i=0;i<arr.length-1;i+=2)
dict[arr[i]] = arr[i+1].trim()
console.log("dict",dict)
【讨论】:
以上是关于句点、单词和冒号的正则表达式的主要内容,如果未能解决你的问题,请参考以下文章