正则表达式，以任意顺序验证字段的语法，具有可接受的值

Posted 2021-05-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了正则表达式，以任意顺序验证字段的语法，具有可接受的值相关的知识，希望对你有一定的参考价值。

请考虑以下情况：我们希望使用正则表达式来验证具有X个字段的命令的语法 - 一个是强制的，两个是可选的。这三个字段可以按任何顺序显示，任意数量的空格将它们分开，并且可接受值的字典有限

Mandatory Field:  "-foo"
Optional Field 1:  Can be either of "-handle" "-bar" or "-mustache"
Optional Field 2:  Can be either of "-meow" "-mix" or "-want"

有效输入的示例：

-foo
-foo           -bar
-foo-want
-foo -meow-bar
-foo-mix-mustache
-handle      -foo-meow
-mustache-foo
-mustache -mix -foo
-want-foo
-want-meow-foo
-want-foo-meow

无效输入的示例：

woof
-handle-meow
-ha-foondle
meow
-foobar
stackoverflow
- handle -foo -mix
-handle -mix
-foo -handle -bar
-foo -handle -mix -sodium

我想你可以说，有三个捕获组，第一个是强制性的，最后两个是可选的：

(-foo){1}
(-handle|-bar|-mustache)?
(-meow|-mix|-want)?

但是我不确定如何编写它以便它们可以按任何顺序排列，可能由任意数量的空格分隔，而不是其他任何顺序。

到目前为止，我有三个前瞻性的捕获组:(％符号表示要完成的东西）

^(?=.*?(foo))(?=.*?(-handle|-bar|-mustache))(?=.*?(-meow|-mix|-want))%Verify that group 1 is present once, optional groups 2 and 3 zero or one times, in any order,  with any spaces%$

添加一个新的捕获组很简单，或者扩展现有组的可接受输入，但我肯定难以接受反向引用，并且不太确定如何扩展支持第四组的检查会影响反向引用。

或者在“ - ”字符上使用boost :: split或boost :: tokenize之类的东西更有意义，然后遍历它们，计算适合第1,2,3组的标记，并且“没有上面，“并核实计数？

它似乎应该是一个简单的扩展或应用程序的升级库。

答案

你提到了提升。你看过program_options了吗？ http://www.boost.org/doc/libs/1_55_0/doc/html/program_options/tutorial.html

另一答案

实际上，无上下文语法会很好。让我们将您的命令解析为如下结构：

struct Command {
    std::string one, two, three;
};

现在，当我们将其作为融合序列进行调整时，我们可以为它编写一个Spirit Qi语法并享受自动属性传播：

CommandParser() : CommandParser::base_type(start) {
    using namespace qi;

    command = field(Ref(&f1)) ^ field(Ref(&f2)) ^ field(Ref(&f3));
    field   = '-' >> raw[lazy(*_r1)];

    f1 += "foo";
    f2 += "handle", "bar", "mustache";
    f3 += "meow", "mix", "want";

    start   = skip(blank) [ command >> eoi ] >> eps(is_valid(_val));
}

在这里，一切都是直截了当的：permutation parser (operator^)允许任何顺序的所有三个字段。

f1，f2，f3是各个字段的可接受符号（下面为Options）。

最后，起始规则添加了空白的跳过，并在最后进行检查（我们是否已达到eoi？是否存在必填字段？）。

现场演示

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
struct Command {
    std::string one, two, three;
};

BOOST_FUSION_ADAPT_STRUCT(Command, one, two, three)

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;

template <typename It> 
struct CommandParser : qi::grammar<It, Command()> {
    CommandParser() : CommandParser::base_type(start) {
        using namespace qi;

        command = field(Ref(&f1)) ^ field(Ref(&f2)) ^ field(Ref(&f3));
        field   = '-' >> raw[lazy(*_r1)];

        f1 += "foo";
        f2 += "handle", "bar", "mustache";
        f3 += "meow", "mix", "want";

        start   = skip(blank) [ command >> eoi ] >> eps(is_valid(_val));
    }
  private:
    // mandatory field check
    struct is_valid_f {
        bool operator()(Command const& cmd) const { return cmd.one.size(); }
    };
    boost::phoenix::function<is_valid_f> is_valid;

    // rules and skippers
    using Options = qi::symbols<char>;
    using Ref     = Options const*;
    using Skipper = qi::blank_type;

    qi::rule<It, Command()> start;
    qi::rule<It, Command(), Skipper> command;
    qi::rule<It, std::string(Ref)> field;

    // option values
    Options f1, f2, f3;
};

boost::optional<Command> parse(std::string const& input) {
    using It = std::string::const_iterator;

    Command cmd;
    bool ok = parse(input.begin(), input.end(), CommandParser<It>{}, cmd);

    return boost::make_optional(ok, cmd);
}

#include <iomanip>
void run_test(std::string const& input, bool expect_valid) {
    auto result = parse(input);

    std::cout << (expect_valid == !!result?"PASS":"FAIL") << "	" << std::quoted(input) << "
";
    if (result) {
        using boost::fusion::operator<<;
        std::cout << " --> Parsed: " << *result << "
";
    }
}

int main() {
    char const* valid[] = { 
        "-foo",
        "-foo           -bar",
        "-foo-want",
        "-foo -meow-bar",
        "-foo-mix-mustache",
        "-handle      -foo-meow",
        "-mustache-foo",
        "-mustache -mix -foo",
        "-want-foo",
        "-want-meow-foo",
        "-want-foo-meow",
    };
    char const* invalid[] = {
        "woof",
        "-handle-meow",
        "-ha-foondle",
        "meow",
        "-foobar",
        "stackoverflow",
        "- handle -foo -mix",
        "-handle -mix",
        "-foo -handle -bar",
        "-foo -handle -mix -sodium",
    };

    std::cout << " === Positive test cases:
";
    for (auto test : valid)   run_test(test, true);
    std::cout << " === Negative test cases:
";
    for (auto test : invalid) run_test(test, false);
}

打印

 === Positive test cases:
PASS    "-foo"
 --> Parsed: (foo  )
PASS    "-foo           -bar"
 --> Parsed: (foo bar )
PASS    "-foo-want"
 --> Parsed: (foo  want)
PASS    "-foo -meow-bar"
 --> Parsed: (foo bar meow)
PASS    "-foo-mix-mustache"
 --> Parsed: (foo mustache mix)
PASS    "-handle      -foo-meow"
 --> Parsed: (foo handle meow)
PASS    "-mustache-foo"
 --> Parsed: (foo mustache )
PASS    "-mustache -mix -foo"
 --> Parsed: (foo mustache mix)
PASS    "-want-foo"
 --> Parsed: (foo  want)
FAIL    "-want-meow-foo"
FAIL    "-want-foo-meow"
 === Negative test cases:
PASS    "woof"
PASS    "-handle-meow"
PASS    "-ha-foondle"
PASS    "meow"
PASS    "-foobar"
PASS    "stackoverflow"
PASS    "- handle -foo -mix"
PASS    "-handle -mix"
PASS    "-foo -handle -bar"
PASS    "-foo -handle -mix -sodium"

另一答案

这是一个强力解决方案，应该适用于相当简单的情况。

我们的想法是在这些捕获组可以出现的顺序的所有排列中建立正则表达式。

在测试数据中，只有6排列。显然，这种方法很容易变得笨拙。

// Build all the permutations into a regex.
std::regex const e{[]{

    std::string e;

    char const* grps[] =
    {
        "\s*(-foo)",
        "\s*(-handle|-bar|-mustache)?",
        "\s*(-meow|-mix|-want)?",
    };

    // initial permutation
    std::sort(std::begin(grps), std::end(grps));

    auto sep = "";

    do
    {
        e = e + sep + "(?:";
        for(auto const* g: grps)
            e += g;
        e += ")";
        sep = "|"; // separate each permutation with |
    }
    while(std::next_permutation(std::begin(grps), std::end(grps)));

    return e;

}(), std::regex_constants::optimize};

// Do some tests

std::vector<std::string> const tests =
{
    "-foo",
    "-foo           -bar",
    "-foo-want",
    "-foo -meow-bar",
    "-foo-mix-mustache",
    "-handle      -foo-meow",
    "-mustache-foo",
    "-mustache -mix -foo",
    "-want-foo",
    "-want-meow-foo",
    "-want-foo-meow",
    "woof",
    "-handle-meow",
    "-ha-foondle",
    "meow",
    "-foobar",
    "stackoverflow",
    "- handle -foo -mix",
    "-handle -mix",
    "-foo -handle -bar",
    "-foo -handle -mix -sodium",
};

std::smatch m;
for(auto const& test: tests)
{
    if(!std::regex_match(test, m, e))
    {
        std::cout << "Invalid: " << test << '
';
        continue;
    }
    std::cout << "Valid: " << test << '
';
}

以上是关于正则表达式，以任意顺序验证字段的语法，具有可接受的值的主要内容，如果未能解决你的问题，请参考以下文章

带有字母、数字、任意顺序的可选特殊字符的正则表达式单词

Java，时间的正则表达式

用于以特定顺序验证电话号码的正则表达式

验证电子邮件不盲目接受的最简单的正则表达式是啥？ [关闭]

提升正则表达式语法验证

正则表达式验证规则，以避免特殊字符不适用于 laravel 中的文本字段