使用 Perl 分组:找到更快的递归解决方案
Posted
技术标签:
【中文标题】使用 Perl 分组:找到更快的递归解决方案【英文标题】:Grouping with Perl: finding a faster solution to recursion 【发布时间】:2015-10-06 17:16:51 【问题描述】:下面的 Perl 代码可以工作,但即使有相当多的计算机资源,它也不能很好地扩展。我希望有人可以帮助我找到更有效的代码,例如用迭代替换递归,如果这是问题的话。
我的数据结构如下所示: 我的 %REV_ALIGN; $REV_ALIGN$dna$rna = ();
任何 dna 密钥都可能有多个 rna 子密钥。同一个 rna 子键可能与多个不同的 dna 键一起出现。目的是根据共享的 dna 序列元素对 rna(转录本)进行分组。例如,如果 dnaA 有 RNA1、RNA8、RNA9 和 RNA4,而 dnaB 有 RNA11、RNA4 和 RNA99,那么我们将所有这些转录本组合在一起(RNA1、RNA9、RNA4、RNA11、RNA99)并继续尝试通过选择其他 dna 添加到组中。我对这个问题的递归解决方案有效,但在使用来自全基因组的数据进行转录组比对时不能很好地扩展。
所以我的问题是:有什么更有效的解决方案来解决这个问题?非常感谢你
my @groups;
while ( my $x =()= keys %REV_ALIGN )
my @DNA = keys %REV_ALIGN;
my $dna = shift @DNA;
# the corresponding list of rna
my @RNA = keys %$REV_ALIGN$dna;
delete $REV_ALIGN$dna;
if ( $x == 1 )
push @groups, \@RNA;
last;
my $ref = group_transcripts ( \@RNA, \%REV_ALIGN );
push @groups, $ref;
sub group_transcripts
my $tran_ref = shift;
my $align_ref = shift;
my @RNA_A = @$tran_ref;
my %RNA;
# create a null hash with seed list of transcripts
@RNA@RNA_A = ();
# get a list of all remaining dna sequences in the alignment
my @DNA = keys %$align_ref;
my %count;
# select a different list of transcripts
for my $dna ( @DNA )
next unless exists $align_ref->$dna;
my @RNA_B = keys %$align_ref->$dna;
# check to see two list share and transcripts
for my $element ( @RNA_A, @RNA_B )
$count$element++;
for my $rna_a ( keys %count )
# if they do, add any new transcripts to the current group
if ( $count$rna_a == 2 )
for my $rna_b ( @RNA_B )
push @RNA_A, $rna_b if $count$rna_b == 1;
delete $align_ref->$dna;
delete $count$_ foreach keys %count;
# recurse to try and continue adding to list
@_ = ( \@RNA_A, $align_ref );
goto &group_transcripts;
delete $count$_ foreach keys %count;
# if no more transcripts can be added, return a reference to the group
return \@RNA_A;
【问题讨论】:
您可能想要分析您的代码***.com/questions/4371714/… Btw,delete $count$_ foreach keys %count;
可以替换为%count = ();
Btw2,如果没有必要,请不要复制数组,即。 my @RNA_A = @$tran_ref;
my $x =()= keys %REV_ALIGN
=> my $x = keys %REV_ALIGN
感谢大家的一些建议。
什么是正确的?列表上下文返回一个列表,但我认为标量上下文不会单独工作
【参考方案1】:
你有一个嵌套四层的循环。这是一个非常安全的赌注,这就是您的代码扩展性差的原因。
如果我正确理解您要完成的工作,请输入
my %REV_ALIGN = (
"DNA1" => map $_ => undef "RNA1", "RNA2" , # \ Linked by RNA1 \
"DNA2" => map $_ => undef "RNA1", "RNA3" , # / \ Linked by RNA3 > Group
"DNA3" => map $_ => undef "RNA3", "RNA4" , # / /
"DNA4" => map $_ => undef "RNA5", "RNA6" , # \ Linked by RNA5 \ Group
"DNA5" => map $_ => undef "RNA5", "RNA7" , # / /
"DNA6" => map $_ => undef "RNA8" , # > Group
);
应该会导致
my @groups = (
[
dna => [ "DNA1", "DNA2", "DNA3" ],
rna => [ "RNA1", "RNA2", "RNA3", "RNA4" ],
],
[
dna => [ "DNA4", "DNA5" ],
rna => [ "RNA5", "RNA6", "RNA7" ],
],
[
dna => [ "DNA6" ],
rna => [ "RNA8" ],
],
);
如果是这样,您可以使用以下内容:
use strict;
use warnings;
use Graph::Undirected qw( );
my %REV_ALIGN = (
"DNA1" => map $_ => undef "RNA1", "RNA2" ,
"DNA2" => map $_ => undef "RNA1", "RNA3" ,
"DNA3" => map $_ => undef "RNA3", "RNA4" ,
"DNA4" => map $_ => undef "RNA5", "RNA6" ,
"DNA5" => map $_ => undef "RNA5", "RNA7" ,
"DNA6" => map $_ => undef "RNA8" ,
);
my $g = Graph::Undirected->new();
for my $dna (keys(%REV_ALIGN))
for my $rna (keys(% $REV_ALIGN$dna ))
$g->add_edge("dna:$dna", "rna:$rna");
my @groups;
for my $raw_group ($g->connected_components())
my %group = ( dna => [], rna => [] );
for (@$raw_group)
my ($type, $val) = split(/:/, $_, 2);
push @ $group$type , $val;
push @groups, \%group;
use Data::Dumper qw( Dumper );
print(Dumper(\@groups));
如果您只需要 RNA,最后部分简化为以下内容:
my @groups;
for my $raw_group ($g->connected_components())
my @group;
for (@$raw_group)
my ($type, $val) = split(/:/, $_, 2);
push @group, $val if $type eq 'rna';
push @groups, \@group;
【讨论】:
感谢您抽出宝贵时间对此作出回应。我不熟悉图表。不需要也有 dna,但我会看看它是如何工作的,然后回复你 在答案底部添加了纯 RNA 版本。以上是关于使用 Perl 分组:找到更快的递归解决方案的主要内容,如果未能解决你的问题,请参考以下文章