正则表达式组捕获[重复]

Posted 2023-02-22

技术标签:

【中文标题】正则表达式组捕获[重复]【英文标题】：Regex Group Capture [duplicate] 【发布时间】：2019-12-19 00:21:31 【问题描述】：

我有一封标准电子邮件，我希望从中提取某些详细信息。

电子邮件中有这样的行：

<strong>Name:</strong> John Smith

所以为了模拟这个，我有以下 javascript：

var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g
match = re.exec(str);
while (match != null) 
    console.log(match[0]);
    match = re.exec(str);

这只会得出一个结果，那就是：

<strong>Name:</strong> John Smith

我希望获得捕获组([^\<]*)，在本例中为John Smith

我在这里错过了什么？

【问题讨论】：

Obligatory link. 我已经找到了“重复”的答案，这就是我从中获取测试脚本的地方您需要在答案中进一步阅读，他说（隐藏在评论中！）：“捕获组 n：匹配 [n]”。如果我在意识到必须有一个欺骗目标之前还没有回答这个问题，为了清楚起见，我会添加一条评论，恕我直言，这太隐蔽了。编码愉快！ 【参考方案1】：

在正则表达式中，第一个匹配始终是匹配的整个字符串。使用组时，您开始与组 1 及以后匹配，因此要解决您的问题，只需将 match[0] 替换为 match[1]。

话虽如此，由于您使用的是 JavaScript，因此最好处理 DOM 本身并从中提取文本，而不是使用正则表达式处理 html。

【讨论】：

【参考方案2】：

在匹配数组中从索引 1 开始提供捕获组：

var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*([^\<]*)/g
match = re.exec(str);
while (match != null) 
    console.log(match[1]); // <====
    match = re.exec(str);

索引 0 包含整个匹配项。

在现代 JavaScript 引擎上，您还可以使用 named 捕获组（(?<theName>...)，您可以通过 match.groups.theName 访问它：

var str = "<br><strong>Name:</strong> John Smith<br>";
var re = /\<strong>Name\s*:\<\/strong>\s*(?<name>[^\<]*)/g
// ---------------------------------------^^^^^^^
match = re.exec(str);
while (match != null) 
    console.log(match.groups.name); // <====
    match = re.exec(str);

【讨论】：

以上是关于正则表达式组捕获[重复]的主要内容，如果未能解决你的问题，请参考以下文章