perl实战-fasta多序列文件GC含量的计算
Posted Genetic evolution
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了perl实战-fasta多序列文件GC含量的计算相关的知识,希望对你有一定的参考价值。
如何对fasta格式文件求GC含量
ATCG00510 pacid=19637948 polypeptide=ATCG00510.1 locus=ATCG00510 ID=ATCG00510.1.TAIR10 annot-version=TAIR10
ATGACAACTTTCAATAACTTACCCTCTATTTTTGTGCCTTTAGTAGGCCTAGTCTTTCCGGCAATTGCAATGGCTTCTTT
ATTTCTTCATATTCAAAAAAATAAGATTTTTTAG
·········································
代码如下
# !/usr/bin/perl -w 2020.6.8 Schnappi
open(IN, "Athaliana.cds.subset.fa") or die ("the file can not open ");
open(OUT, ">Ath.GCcontent.txt") or die ("Can not write to out.txt: $!");
my %seq;
while(<IN>){
chomp;
if(/^>(\S+)/){
$name=$1;}
else{$seq{$name} .=$_;}
}
foreach $name(keys(%seq)){
$length=$seq{$name};
$G=$length =~ tr/G/G/;
$C=$length =~ tr/C/C/;
$GC = $G + $C;
$len = length($length);
$content=$GC/$len;
print OUT "$name\t$len\t$content\n";
}
close IN;
close OUT;
# 输出依次为基因ID 序列长度 GC个数 GC%
At2g02540 933 389 0.416934619506967
At2g20690 816 382 0.468137254901961
At2g05940 1389 610 0.439164866810655
At2g06050 1176 562 0.477891156462585
At2g20618 246 107 0.434959349593496
At2g41860 1593 653 0.40991839296924
At2g17170 987 429 0.434650455927052
At2g07714 591 260 0.439932318104907
At2g24590 591 308 0.521150592216582
At2g21910 1533 654 0.426614481409002
At2g01830 3243 1363 0.420289855072464
At2g12462 750 346 0.461333333333333
At2g46455 561 224 0.399286987522282
At2g35240 699 342 0.489270386266094
At2g27460 2238 984 0.439678284182306
At2g43400 1902 830 0.436382754994742
At2g31980 444 207 0.466216216216216
At2g30933 684 304 0.444444444444444
AtCg00710 222 88 0.396396396396396
At2g41060 1356 643 0.474188790560472
At2g33006 189 69 0.365079365079365
At2g30130 582 258 0.443298969072165
At2g23640 621 266 0.428341384863124
At2g19580 813 381 0.468634686346863
Perl是一个没落的语言 这句话一点也没错。我以前天真的还为他说话,这次彻底转黑了 ---------著名学者 Schnappi
以上是关于perl实战-fasta多序列文件GC含量的计算的主要内容,如果未能解决你的问题,请参考以下文章
生物信息学算法之Python实现|Rosalind刷题笔记:005 GC含量计算