在 Perl 中使用正则表达式解析属性

Posted 2023-03-24

技术标签:

【中文标题】在 Perl 中使用正则表达式解析属性【英文标题】：Parsing attributes with regex in Perl 【发布时间】：2010-09-05 20:22:06 【问题描述】：

这是我最近遇到的一个问题。我有表单的属性字符串

"x=1 and y=abc and z=c4g and ..."

一些属性具有数值，一些具有字母值，一些混合，一些具有日期，等等。

每个字符串应该以“x=someval and y=anotherval”开头，但有些则没有。我有三件事需要做。

x

y

x

y

鉴于顶部的示例，这将导致以下变量：

$x = 1;
$y = "abc";
$remainder = "z=c4g and ..."

我的问题是：是否有一种（相当）简单的方法来解析这些和使用单个正则表达式进行验证？即：

if ($str =~ /someexpression/)

    $x = $1;
    $y = $2;
    $remainder = $3;

请注意，字符串可能仅包含 x 和 y 属性。这是一个有效的字符串。

我将发布我的解决方案作为答案，但它不符合我的单一正则表达式偏好。

【问题讨论】：

【参考方案1】：

假设您还想对其他名称 = 值对执行某些操作，我会这样做（使用 Perl 版本 5.10）：

use 5.10.0;
use strict;
use warnings;

my %hash;
while(
    $string =~ m
       (?: ^ | \G )    # start of string or previous match
       \s*

       (?<key>   \w+ ) # word characters
       =
       (?<value> \S+ ) # non spaces

       \s*             # get to the start of the next match
       (?: and )?
    xgi
)
    $hash$+key = $+value;


# to make sure that x & y exist
die unless exists $hashx and exists $hashy;

在较旧的 Perl 上（至少 Perl 5.6）；

use strict;
use warnings;

my %hash;
while(
    $string =~ m
       (?: ^ | \G )   # start of string or previous match
       \s*

       ( \w+ ) = ( \S+ )

       \s*            # get to the start of the next match
       (?: and )?
    xgi
)
    $hash$1 = $2;


# to make sure that x & y exist
die unless exists $hashx and exists $hashy;

如果您需要处理更多数据，这些还有继续工作的额外好处。

【讨论】：

\G 已经匹配字符串的开头，因此您可以将(?:^|\G) 替换为\G。但更好的方法是把 \G 放在 factor 开头，把 and 放在开头：\G (?: ^ | \s+ and \s+) (\w+) = (\S+)【参考方案2】：

我不是最擅长正则表达式，但这似乎与您正在寻找的内容非常接近：

/x=(.+) and y=([^ ]+)( and (.*))?/

除非您使用 $1、$2 和 $4。使用中：

my @strs = ("x=1 and y=abc and z=c4g and w=v4l",
            "x=yes and y=no",
            "z=nox and w=noy");

foreach (@strs) 
    if ($_ =~ /x=(.+) and y=([^ ]+)( and (.*))?/) 
        $x = $1;
        $y = $2;
        $remainder = $4;
        print "x: $x; y: $y; remainder: $remainder\n";
     else 
        print "Failed.\n";

输出：

x: 1; y: abc; remainder: z=c4g and w=v4l
x: yes; y: no; remainder: 
Failed.

这当然会遗漏大量的错误检查，而且我并不了解您输入的所有内容，但这似乎可行。

【讨论】：

【参考方案3】：

作为对陆克文版本的相当简单的修改，

/^x=(.+) and y=([^ ]+)(?: and (.*))?/

将允许您使用 $1、$2 和 $3（?: 使其成为非捕获组），并确保字符串以“x=”开头，而不是允许“not_x=”匹配

如果您对 x 和 y 值有更好的了解，则应该使用它来进一步收紧正则表达式：

my @strs = ("x=1 and y=abc and z=c4g and w=v4l",
        "x=yes and y=no",
        "z=nox and w=noy",
        "not-x=nox and y=present",
        "x=yes and w='there is no and y=something arg here'");

foreach (@strs) 
    if ($_ =~ /^x=(.+) and y=([^ ]+)(?: and (.*))?/) 
        $x = $1;
        $y = $2;
        $remainder = $3;
        print "x: $x; y: $y; remainder: $remainder\n";
     else 
        print "$_ Failed.\n";

输出：

x: 1; y: abc; remainder: z=c4g and w=v4l
x: yes; y: no; remainder: 
z=nox and w=noy Failed.
not-x=nox and y=present Failed.
x: yes and w='there is no; y: something; remainder:

请注意，最后一个测试的缺失部分是由于当前版本的 y 测试不需要空格，如果 x 测试具有与字符串相同的限制，则会失败。

【讨论】：

【参考方案4】：

Rudd 和 Cebjyre 已经帮助你完成了大部分工作，但他们都有一些问题：

陆克文建议：

/x=(.+) and y=([^ ]+)( and (.*))?/

Cebjyre 将其修改为：

/^x=(.+) and y=([^ ]+)(?: and (.*))?/

第二个版本更好，因为它不会将 "not_x=foo" 与 "x=foo" 混淆，但会接受诸如 "x=foo z=bar y=baz" 之类的内容并设置 $1 = "foo z=bar " 这是不可取的。

这可能就是你要找的东西：

/^x=(\w+) and y=(\w+)(?: and (.*))?/

这不允许 x= 和 y= 选项、places 和 allow 以及可选的 "and..." 之间的任何内容，这将在 $3 中

【讨论】：

【参考方案5】：

这基本上是我为解决这个问题所做的：

($x_str, $y_str, $remainder) = split(/ and /, $str, 3);

if ($x_str !~ /x=(.*)/)

    # error


$x = $1;

if ($y_str !~ /y=(.*)/)

    # error


$y = $1;

我省略了一些额外的验证和错误处理。这种技术很有效，但并不像我希望的那样简洁或漂亮。我希望有人能给我更好的建议。

【讨论】：

在我看来，这比任何“一个正则表达式来统治所有”解决方案都更简单、更易于维护。我可能会在正则表达式的开头添加一个 ^ 来匹配 x= 和 y= 以避免出现 not_x=... 或类似情况。为什么要一个正则表达式？

以上是关于在 Perl 中使用正则表达式解析属性的主要内容，如果未能解决你的问题，请参考以下文章

通过正则表达式解析 CSS

解析posix与perl标准的正则表达式区别

如何在 perl 正则表达式中组合多个 Unicode 属性？

Perl：使用正则表达式将十六进制编码的字符串解析为数组

Perl - 正则表达式匹配的输出非常奇怪，确实

Python 中的 Perl 兼容正则表达式 (PCRE)