perl 语言一些常见生信处理脚本

Posted 01说

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了perl 语言一些常见生信处理脚本相关的知识,希望对你有一定的参考价值。

反向互补序列:

open IN,"c:/Users/11852/Desktop/seqs.fa";my$seq="";my$key;open OUT,">c:/Users/11852/Desktop/seqs_.fa";while (my $line =<IN>){ chomp $line; if ($line=~ ">"){ if (length($seq)!= 0){ $seq =join("",reverse (split(//,$seq))); $seq=~ tr/[ATGC]/[TACG]/; print OUT $key."\n".$seq."\n";  $seq=""; } $key=$line; }else{$seq.=$line;}}$seq =join("",reverse (split(//,$seq)));$seq=~ tr/[ATGC]/[TACG]/;print OUT $key."\n".$seq."\n";


把cDNA序列转录为RNA,再翻译为蛋白质:

use strict;open IN,"c:/Users/11852/Desktop/human_TP53_cDNA.fasta";my $cdna;
while (my $line=<IN>){ chomp $line; if ($line=~ ">"){ next; } $cdna.=$line; }print "the length of cdna\t".length($cdna)."\n".$cdna."\n";
$cdna=~ tr/T/U/;close IN;
open IN,"c:/Users/11852/Desktop/codon_table.txt";my %codon_dic;while (my$line =<IN>){ chomp $line; my @li=split(/\t/,$line); my$codon=$li[0]; $codon=~tr/T/U/; $codon_dic{$codon}=$li[1];}my $protein;for (my$i=0;$i<length($cdna);$i+=3){ my$j=substr($cdna,$i,3); my$a= $codon_dic{$j}; if ($a eq "Stop"){ next; } $protein .= $a;}print $protein;

计算GC含量:

open IN,"c:/Users/11852/Desktop/Ecoli.fa";my $sequence_length;my $gc_count;while (my $line=<IN>){ chomp $line; if ($line =~ />/){ next } $gc_count+=$line=~ tr/[GC]/#/; $sequence_length+=length($line);}my$percent=$gc_count/$sequence_length;print "GC content is $percent";

根据位置提取序列

my $header;my $seq = "";
my %fasta;open IN, "c:/Users/11852/Desktop/grape38.fa" or die "$!";
while(my $line=<IN>){ chomp($line); if ($line=~/^>/){ if (length $seq > 0){ $fasta{$header} = $seq; } $header = substr($line,1,1); $seq = ""; }else{ $seq.=$line; }}$fasta{$header} = $seq;close IN;

open IN, "c:/Users/11852/Desktop/位置.txt" or die "$!";open OUT, ">c:/Users/11852/Desktop/result_file" or die "$!";
while (my $line = <IN>){ chomp ($line); my ($chr,$start,$end)=split (/\t/,$line); my $seq = $fasta{$chr}; my $part = substr($seq,$start, $end-$start); print OUT ">$line\n$part\n";}close IN;close OUT;

利用hash计数

open(IN,"c:/Users/11852/Desktop/exp.txt") || die "$!";open(OUT,">c:/Users/11852/Desktop/test.txt") ||die "$!";

my%stat=( "1-3"=>0, "3-5"=>0);while (my$line=<IN>){ chomp $line; my@tmp=split(/\t/,$line); if ($tmp[1]>=1 and $tmp[1]<=3){ $stat{"1-3"}++; } if ($tmp[1]>3 and $tmp[1]<=5){ $stat{"3-5"}++; } }close(IN);
for my$k(keys %stat){ print OUT "$k\t$stat{$k}\n";}close(OUT);

根据id提取表达量

open(IN,"c:/Users/11852/Desktop/exp.txt") || die "$!";open(INID,"c:/Users/11852/Desktop/ID.txt") || die "$!";open(OUT,">c:/Users/11852/Desktop/test.txt") ||die "$!";my %keep_id=();while (my$line=<INID>){ chomp $line; $keep_id{$line}=1;}close INID;while (my$line=<IN>){ chomp $line; my @tmp=split(/\t/,$line); if (exists $keep_id{$tmp[0]}){ print OUT $line."\n" } }close(OUT);close(IN);



以上是关于perl 语言一些常见生信处理脚本的主要内容,如果未能解决你的问题,请参考以下文章

生信分析——perl入门

生信学习周如何系统入门Perl

生信菜鸟经如何系统入门Perl

perl编程:正则匹配

生信攻略R,perl,python等软件常见更新设置

你必须要会的生信基础套餐-perl与Linux