如何使用 JavaScript 正则表达式拆分此文本?
Posted
技术标签:
【中文标题】如何使用 JavaScript 正则表达式拆分此文本?【英文标题】:How to split this text using JavaScript regular expression? 【发布时间】:2021-06-02 12:08:10 【问题描述】:我想拆分此文本。我正在尝试使用 javascript 正则表达式。
(1) 真的不是。 (2) 嗯。 (3)看哪,王子(4)是他自然元素的关键,畏缩在他生命中女人的摆布之下。 (5) 见我,也许你想和我的女儿们一起吐口水,教她们一些组合。 (6) 陛下,您无疑是最好的老师。 (7) 例如,是我女儿教我现代世界的语言。
我想将其解析为片段组。我正在寻找这些结果之一。
[
[1, "Really not."],
[2, "Uh huh."],
[3, "Behold Prince"],
]
[
id: 1, text: "Really not.",
id: 2, text: "Uh huh.",
id: 3, text: "Behold Prince",
]
我使用这种模式。
/\(([0-9])\)1,3(.+?)\(/g
你能帮帮我吗?我应该使用什么模式来正确拆分文本?
提前谢谢你!
【问题讨论】:
你可以使用array.map
为什么结果应该只包含前三个“元素”,那么输入字符串的其余部分呢?
... 作为旁注;所有纯粹的基于 regex 和 matchAll
的方法,它们依赖于开头括号来检测文本片段的终止,一旦开头括号确实作为文本内容的一部分出现(由于一个允许的字符)。更简单的基于split
/ reduce
的方法更可靠地覆盖这种边缘情况。
如果顺便解决了你的问题,你应该选择一个答案。
【参考方案1】:
你可以在 javascript 中使用 regex 和 string.matchAll 函数来做你想做的事
const str = `(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.`;
let array = [...str.matchAll(/\(([0-9]+)\)\s*(.*?)\s*(?=$|\()/g)].map(a=>[+a[1],a[2]])
console.log(array)
我使用The fourth bird's regex 更新了我的答案,因为它比我写的正则表达式干净得多。
【讨论】:
太棒了!非常感谢!【参考方案2】:您可以断言它或字符串的结尾,而不是匹配(
。
这部分\)1,3
表示重复右括号1-3次。
如果要匹配1-3位数字:
\(([0-9]+)\)\s*(.*?)\s*(?=$|\()
\(
匹配(
([0-9]+)
在group 1中捕获1+个数字(在代码中用m[1]
表示)
\)
匹配)
\s*
匹配可选的空白字符
(.*?)
在组 2 中尽可能少地捕获字符 (在代码中由 m[2]
表示)
\s*
匹配可选的空白字符
(?=$|\()
断言字符串的结尾或(
到右边
Regex demo
const regex = /\(([0-9]+)\)\s*(.*?)\s*(?=$|\()/g;
const str = `(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.`;
console.log(Array.from(str.matchAll(regex), m => [m[1], m[2]]));
【讨论】:
非常感谢! 这对我来说是宝贵的一课。非常感谢您的帮助。 @TodorIliev 不客气 :-) 祝你好运!【参考方案3】:...一种基于matchAll
以及RegExp
的方法,它使用named capture groups 和positive lookahead ... /\((?<id>\d+)\)\s*(?<text>.*?)\s*(?=$|\()/g
...
// see ... [https://regex101.com/r/r39BoJ/1]
const regX = (/\((?<id>\d+)\)\s*(?<text>.*?)\s*(?=$|\()/g);
const text = "(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance."
console.log([
...text.matchAll(regX)
].map(
(groups: id, text ) => ( id: Number(id), text )
)
);
.as-console-wrapper min-height: 100%!important; top: 0;
注意
上述方法不包括在文本片段中出现(允许存在)左括号/(
。因此,为了始终处于保存状态,OP 应考虑基于 split
/ reduce
的方法...
const text = " (1) Really not. (2) Uh (huh). (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, (for instance). "
console.log(
text
.split(/\s*\((\d+)\)\s*/)
.slice(1)
.reduce((list, item, idx) =>
if (idx % 2 === 0)
list.push( id: Number(item) );
else
// list.at(-1).text = item;
list[list.length - 1].text = item.trim();
return list;
, [])
);
// test / check ...
console.log(
'text.split(/\s*\((\d+)\)\s*/) ...',
text.split(/\s*\((\d+)\)\s*/)
);
.as-console-wrapper min-height: 100%!important; top: 0;
【讨论】:
太棒了!非常感谢!以上是关于如何使用 JavaScript 正则表达式拆分此文本?的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 javascript 正则表达式替换 URL 的主机部分