C++ 11 正则表达式未按预期返回组
Posted
技术标签:
【中文标题】C++ 11 正则表达式未按预期返回组【英文标题】:C++ 11 regex expression does not return groups as expected 【发布时间】:2017-11-22 12:23:07 【问题描述】:我想编写一个解析特殊格式字符串语法的正则表达式。它应该可以帮助我检测格式错误并将格式字符串拆分为单独的部分进行处理。
但是,尽我所能,我无法让拆分按预期工作。
根据我在文档中阅读的内容,'(?: )' 语法应该定义一个非拆分组,而普通括号表达式 '( )' 应该定义一个单独返回的子匹配。但事实并非如此。
这是我的代码:
#include <iostream>
#include <regex>
#include <string>
std::string parseCode( std::regex_constants::error_type etype);
int main()
const std::string regex_str( "(?:([^\\[]+)(\\[[^\\]]*\\])( +|\\n))");
std::regex atr;
std::cout << "regex string = '" << regex_str << "'" << std::endl;
try
atr.assign( regex_str);
catch (const std::regex_error& e)
std::cerr << "Error: " << e.what() << "; code: " << parseCode(e.code()) << std::endl;
exit( EXIT_FAILURE);
// end try
const std::string title( "First Title[] Second Title[] -Third Title[]");
auto regex_begin = std::sregex_iterator( title.begin(), title.end(), atr);
for (std::sregex_iterator i = regex_begin; i != std::sregex_iterator(); ++i)
std::smatch match = *i;
std::cout << "got: '" << match.str() << "'" << std::endl;
// end for
auto subregex_begin = std::sregex_token_iterator( title.begin(),
title.end(), atr, -1);
for (std::sregex_token_iterator i = subregex_begin; i != std::sregex_token_iterator(); ++i)
std::cout << "got sub: '" << *i << "'" << std::endl;
// end for
// end scope
std::string parseCode( std::regex_constants::error_type etype)
switch (etype)
case std::regex_constants::error_collate:
return "error_collate: invalid collating element request";
case std::regex_constants::error_ctype:
return "error_ctype: invalid character class";
case std::regex_constants::error_escape:
return "error_escape: invalid escape character or trailing escape";
case std::regex_constants::error_backref:
return "error_backref: invalid back reference";
case std::regex_constants::error_brack:
return "error_brack: mismatched bracket([ or ])";
case std::regex_constants::error_paren:
return "error_paren: mismatched parentheses(( or ))";
case std::regex_constants::error_brace:
return "error_brace: mismatched brace( or )";
case std::regex_constants::error_badbrace:
return "error_badbrace: invalid range inside a ";
case std::regex_constants::error_range:
return "erro_range: invalid character range(e.g., [z-a])";
case std::regex_constants::error_space:
return "error_space: insufficient memory to handle this regular expression";
case std::regex_constants::error_badrepeat:
return "error_badrepeat: a repetition character (*, ?, +, or ) was not preceded by a valid regular expression";
case std::regex_constants::error_complexity:
return "error_complexity: the requested match is too complex";
case std::regex_constants::error_stack:
return "error_stack: insufficient memory to evaluate a match";
default:
return "";
这是输出:
regex string = '(?:([^\[]+)(\[[^\]]*\])( +))'
got: 'First Title[] '
got: 'Second Title[] '
got sub: ''
got sub: ''
got sub: '-Third Title[]'
这就是我想要/期望的:
regex string = '(?:([^\[]+)(\[[^\]]*\])( +))'
got: 'First Title[] '
got: 'Second Title[] '
got: '-Third Title[]'
got sub: 'First Title'
got sub: '[]'
got sub: ' '
got sub: 'Second Title'
got sub: '[]'
got sub: ' '
got sub: '-Third Title'
got sub: '[]'
我在 RHEL 7.2 上使用 g++ 5.3.1。 在 IdeOne.com 上使用 g++ 6.3 得到了相同的结果:https://ideone.com/dj4Mqf
我做错了什么?
【问题讨论】:
【参考方案1】:1)您的正则表达式与最后一部分不匹配,将其更改为:
const std::string regex_str("([^\\[]+)(\\[[^\\]]*\\])(\\s+|\\n|$)");
2) match.str()
返回整个匹配的字符串,提取匹配的组,使用operator[]
:
std::smatch match = *i;
std::cout << "got: 1='" << match[1] << "' 2='" << match[2] << "' 3='" << match[3] << "'" << std::endl;
输出:
regex string = '([^\[]+)(\[[^\]]*\])(\s+|\n|$)'
got: 1='First Title' 2='[]' 3=' '
got: 1='Second Title' 2='[]' 3=' '
got: 1='-Third Title' 2='[]' 3=''
【讨论】:
以上是关于C++ 11 正则表达式未按预期返回组的主要内容,如果未能解决你的问题,请参考以下文章