请问用Linux或Perl如何从蛋白质序列文件中取出存储在另一个文件中的10个蛋白质ID的Fasta格式序列
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了请问用Linux或Perl如何从蛋白质序列文件中取出存储在另一个文件中的10个蛋白质ID的Fasta格式序列相关的知识,希望对你有一定的参考价值。
参考技术A #!/usr/bin/perl -wuse strict;
#contact perlcoder weixin
my %hash;
while(<DATA>)
if($_=~/\\S+/)
else
next;
chomp;
my ($key)=$_=~/>(\\S+)/;
my $value=<DATA>;
chomp($value);
$hash$key=$value;
print $hash'Gh_A04G0739',"\\n";
__DATA__
>Gh_A12G2559
ATGGCTACTTTCTTTGGCTATTTTAC
>Gh_A12G2554
ATGGCGGGAACTATCCAATCCCTAAT
>Gh_A04G0755
ATGGCCTCCGATCAGACCCTTTTTCA
>Gh_A04G0739
ATGGAGAAAGCAAAACCTGAAGCACC
>Gh_A04G0738
ATGGCGCAGGTTTTAGACGACGCTGA
>Gh_A07G1346
ATGGTTCATTGTCTGCCAAAGGTTTC
>Gh_A07G1331
ATGGCAGACAAGGATTCTTCAAGGCC
>Gh_A07G1334
ATGACGACGCCAACTCGAGATGCAAT
>Gh_A07G1335
ATGGCCGCAACTAGATTCCTCTCTCA
>Gh_A08G1510
ATGGCTACTGCACCGATAAAGTCTCA
>Gh_A08G1433
ATGGGTAAAACACCTACTGGCAAGGA来自:求助得到的回答 参考技术A 最好输入输出都有样例,贴出来。
否则无法给出答复
perl 和python 翻译序列
把人的P53的cDNA序列转录为RNA,再翻译为蛋白质;
perl
use strict;
open IN,"c:/Users/11852/Desktop/human_TP53_cDNA.fasta";
my $cdna;
while (my $line=<IN>){
chomp $line;
if ($line=~ ">"){
next;
}
$cdna.=$line;
}
print "the length of cdna\t".length($cdna)."\n".$cdna."\n";
$cdna=~ tr/T/U/;
close IN;
open IN,"c:/Users/11852/Desktop/codon_table.txt";
my %codon_dic;
while (my$line =<IN>){
chomp $line;
my @li=split(/\t/,$line);
my$codon=$li[0];
$codon=~tr/T/U/;
$codon_dic{$codon}=$li[1];
}
my $protein;
for (my$i=0;$i<length($cdna);$i+=3){
my$j=substr($cdna,$i,3);
my$a= $codon_dic{$j};
if ($a eq "Stop"){
next;
}
$protein .= $a;
}
print $protein;
python
fin=open("data/human_TP53_cDNA.fasta")
tp53_cdna=""
tp53_mrna=""
tp53_protein=""
for line in fin:
if line[0]==">":
continue
tp53_cdna+=line.strip()
print("TP53 cDNA sequence (Length: ", len(tp53_cdna), "): ",
tp53_cdna, sep="")
tp53_mrna=tp53_cnda.replace("T","U")
print("TP53 mRNA sequence (Length: ", len(tp53_mrna), "): ",
tp53_mrna, sep="")
codon_dic={}
f_codon=open("data/codon_table.txt","rt")
for line in f_codon:
sp=line.strip().split("\t")
codon=sp[0].replace("T","U")
aa=sp[1]
codon_dic[codon]=aa
for i in range(0,len(tp53_mrna),3):
aa=codon_dic[tp53_mrna[i:i+3]]
if aa=="Stop":
break
tp53_protein+=aa
print("TP53 protein sequence (Length: ", len(
tp53_protein), "): ", tp53_protein, sep="")
fin.close()
f_codon.close()
以上是关于请问用Linux或Perl如何从蛋白质序列文件中取出存储在另一个文件中的10个蛋白质ID的Fasta格式序列的主要内容,如果未能解决你的问题,请参考以下文章