用perl分割线

Posted 2023-02-22

技术标签:

【中文标题】用perl分割线【英文标题】：Split line with perl 【发布时间】：2011-09-12 10:07:17 【问题描述】：

   title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es))

如何用 perl in 分割这个：

   title: Football
   team: Real Madrid
   stadium: Santiago Bernabeu
   players: Zinédine Zidane Ronaldo Luís Figo Roberto Carlos Raúl
   personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es))

【问题讨论】：

【参考方案1】：

使用前瞻断言：

say for split /(?=\w+:)/, $real_madrid_string;

输出

title: Football
team: Real Madrid
stadium: Santiago Bernabeu
players: Zinédine Zidane Ronaldo Luís Figo Roberto Carlos Raúl
personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es))

【讨论】：

如果将“players”翻译成另一种语言：“players”=“jucător”，零宽度前瞻发现字符“ă”而不是冒号“:”（一个被解释为非单词的单词字符字符）并在此处拆分。谢谢。那么你的 Perl 版本不够新，不能直接支持这个。您可以尝试拆分 \PLetter，但我想您还需要调整 Perl 选项以使其进入 UTF8 风格，也许使用 Perl -CSD。也许其中一种就足够了。【参考方案2】：

应该这样做。 line.txt 包含“标题：足球队：皇家马德里球场：圣地亚哥伯纳乌球员：齐达内、罗纳尔多、路易斯菲戈、罗伯托卡洛斯、劳尔人员：何塞穆里尼奥（主教练）艾托卡兰卡（助理教练（es））”

#!/usr/bin/perl
use strict;
use warnings;

my $fn="./line.txt";

open(IN,$fn);
my @lines=<IN>;

my %hash;
my $hashKey;

foreach my $line (@lines)
        $line=~s/\n//g;
        my @split1=split(" +",$line);
        foreach my $split (@split1)
                if($split=~m/:$/)
                        $hashKey=$split;
                else
                        if(defined($hash$hashKey))
                                $hash$hashKey=$hash$hashKey.$split." ";
                        else
                                $hash$hashKey=$split." ";
                        
                
        


close(IN);


foreach my $key (keys %hash)
        print $key.":".$hash$key."\n";

【讨论】：

【参考方案3】：

与许多人在他们的答案中所说的相反，您不需要前瞻（除了 Regex 自己的），您只需要捕获分隔符的一部分，如下所示：

my @hash_fields = grep  length;  split /\s*(\w+):\s*/;

我的完整解决方案如下：

my %handlers
    = ( players   => sub  return [ grep  length;  split /\s*,\s*/, shift ]; 
      , personnel => sub  
            my $value = shift;
            my %personnel;
            # Using recursive regex for nested parens
            while ( $value =~ m/([^(]*)([(](?:[^()]+|(?2))*[)])/g ) 
                my ( $name, $role ) = ( $1, $2 );
                $role =~ s/^\s*[(]\s*//;
                $role =~ s/\s*[)]\s*$//;
                $name =~ s/^\s+//;
                $name =~ s/\s+$//;
                $personnel $role  = $name;
            
            return \%personnel;
        
      );
my %hash = grep  length;  split /(?:^|\s+)(\w+):\s+/, <DATA>;
foreach my $field ( keys %handlers )  
    $hash $field  = $handlers $field ->( $hash $field  );

转储看起来像这样：

%hash: 
     personnel => 
                    'assistant coach (es)' => 'Aitor Karanka',
                    'head coach' => 'José Mourinho'
                  ,
     players => [
                  'Zinédine Zidane',
                  'Ronaldo',
                  'Luís Figo',
                  'Roberto Carlos',
                  'Raúl'
                ],
     stadium => 'Santiago Bernabeu',
     team => 'Real Madrid',
     title => 'Football'

【讨论】：

$value =~ m/([^(]*)([(](?:[^()]+|(?2))*[)])/g 未定义 (? ...) 序列。 @user935420，不知道你遇到了什么问题。在我的草莓 perl 5.12 和 ActivePerl 5.14 中，它可以顺利运行。【参考方案4】：

最好的方法是使用split 命令并使用零宽度前瞻：

$string = "title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es))";

@split_string = split /(?=\b\w+:)/, $string;

【讨论】：

对不起...我摆脱了) 使用 Vim 的习惯。如果将“players”翻译成另一种语言：“players”=“jucător”，零宽度前瞻发现字符“ă”并在此处拆分。谢谢。 @user：您必须确保您的区域设置正确。 \w 被明确设计为以与语言环境无关的方式使用，并且应该在后台处理语言环境差异。 @user: 在这里查看如何处理语言环境：perldoc.perl.org/perllocale.html【参考方案5】：

$string = "title: Football team: Real Madrid stadium: Santiago Bernabeu players: Zinédine Zidane, Ronaldo, Luís Figo, Roberto Carlos, Raúl personnel: José Mourinho (head coach) Aitor Karanka (assistant coach (es))";
@words = split(' ', $string);

@lines = undef;
@line = shift(@words);
foreach $word (@words)

    if ($word =~ /:/)
    
        push(@lines, join(' ', @line));
        @line = undef;
    
    else
    
        push(@line, $word);
    


print join("\n", @lines);

【讨论】：

这行不通，因为 Perl 没有数组数组的概念。第一个push 将简单地将@line 的内容连接到@lines 的末尾。为了使其工作，@lines 必须是由@line 生成的数组的引用数组。 @lines 是一个字符串数组，我只将字符串推入其中在发布代码之前运行代码通常是个好主意。这根本不会运行。我可以看到初学者缺少分号。 push 将数组作为第一个参数，您可能打算在那里连接。但即便如此，它也引出了一个问题，为什么要绕远路？啊，我总是对推送参数的顺序感到困惑。至于为什么，我是 perl 新手，没有考虑过前瞻 @Zaid：我不会责怪 Bwmat 没有考虑前瞻。毕竟，实现它的方法不止一种。

以上是关于用perl分割线的主要内容，如果未能解决你的问题，请参考以下文章