blast的结果
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了blast的结果相关的知识,希望对你有一定的参考价值。
参考技术A 做出来了诶O(∩_∩)O哈哈~ 有点小开心(*^▽^*)conda装的blast做不出来,还不知道是什么原因。重新在自己家里装的blast运行没问题。
建库部分:遇到了一些关于grep、sed、awk还有for循环的问题,再开一篇记录吧。还有fa的格式问题。然后就是,原来blast是可以多对多比对的,那就可以一次性做批量处理了~
#建库
/home/hmguang/biosoft/blast/blast/ncbi-blast-2.9.0+/bin/makeblastdb -in refdata.fasta -dbtype nucl
#比对
/home/hmguang/biosoft/blast_project/blast/ncbi-blast-2.9.0+/bin/blastn -query testseq -out result.txt -db refdata.fasta -evalue 1e-5
结果:
BLASTN 2.9.0+
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: refdata.fasta
16 sequences; 411,130 total letters
Query= YW607_F06
Length=524
Score E
Sequences producing significant alignments: (Bits) Value
NC_000017.11:4932277-4935023Homosapienschromosome17_GP1BA,GRCh38.... 350 8e-98
NC_000017.11:4932277-4935023Homosapienschromosome17_GP1BA_core,GR... 350 8e-98
>NC_000017.11:4932277-4935023Homosapienschromosome17_GP1BA,GRCh38.p13PrimaryAssembly
Length=2747
Score = 350 bits (189), Expect = 8e-98
Identities = 194/197 (98%), Gaps = 0/197 (0%)
Strand=Plus/Minus
Query 1 TACAGCGAGTTCTCTTGGAGGAGAAGGGTGTCGAGATTCTCCAGCCCATTCAGGAGCCCA 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 930 TACAGCGAGTTCTCTTGGAGGAGAAGGGTGTCGAGATTCTCCAGCCCATTCAGGAGCCCA 871
Query 61 GCGGGGAGCTCAGTCAAGTTGTTGTTAGCCAGACTGAGCTTCTCCAGCTTGGGTGTGGGC 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 870 GCGGGGAGCTCAGTCAAGTTGTTGTTAGCCAGACTGAGCTTCTCCAGCTTGGGTGTGGGC 811
Query 121 GTCAGGAGCCCTGGGGGCAGGGTCTTCAGCTCATTGCCTTTCAGGTAGAGCTCTTGGAGT 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 810 GTCAGGAGCCCTGGGGGCAGGGTCTTCAGCTCATTGCCTTTCAGGTAGAGCTCTTGGAGT 751
Query 181 TCGCMAGTACCACGCAG 197
|||| | |||||||||
Sbjct 750 TCGCCAAGACCACGCAG 734
>NC_000017.11:4932277-4935023Homosapienschromosome17_GP1BA_core,GRCh38.p13PrimaryAssembly
Length=357
Score = 350 bits (189), Expect = 8e-98
Identities = 194/197 (98%), Gaps = 0/197 (0%)
Strand=Plus/Minus
Query 1 TACAGCGAGTTCTCTTGGAGGAGAAGGGTGTCGAGATTCTCCAGCCCATTCAGGAGCCCA 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 295 TACAGCGAGTTCTCTTGGAGGAGAAGGGTGTCGAGATTCTCCAGCCCATTCAGGAGCCCA 236
Query 61 GCGGGGAGCTCAGTCAAGTTGTTGTTAGCCAGACTGAGCTTCTCCAGCTTGGGTGTGGGC 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 235 GCGGGGAGCTCAGTCAAGTTGTTGTTAGCCAGACTGAGCTTCTCCAGCTTGGGTGTGGGC 176
Query 121 GTCAGGAGCCCTGGGGGCAGGGTCTTCAGCTCATTGCCTTTCAGGTAGAGCTCTTGGAGT 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 175 GTCAGGAGCCCTGGGGGCAGGGTCTTCAGCTCATTGCCTTTCAGGTAGAGCTCTTGGAGT 116
Query 181 TCGCMAGTACCACGCAG 197
|||| | |||||||||
Sbjct 115 TCGCCAAGACCACGCAG 99
Lambda K H
1.37 0.632 1.16
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 207467130
Query= YW614_E07
Length=284
Score E
Sequences producing significant alignments: (Bits) Value
NC_000005.10:52989326-53094779Homosapienschromosome5_ITGA2,GRCh38... 291 3e-80
NC_000005.10:52989326-53094779Homosapienschromosome5_ITGA2_4core,... 291 3e-80
>NC_000005.10:52989326-53094779Homosapienschromosome5_ITGA2,GRCh38.p13PrimaryAssembly
Length=105454
Score = 291 bits (157), Expect = 3e-80
Identities = 161/164 (98%), Gaps = 0/164 (0%)
Strand=Plus/Plus
Query 1 TTGTCAGCAACCAAAACAAAARGTTAACATTTTCAGTAACGCTGAAAAATAAAAGGGAAA 60
||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||
Sbjct 83807 TTGTCAGCAACCAAAACAAAAGGTTAACATTTTCAGTAACGCTGAAAAATAAAAGGGAAA 83866
Query 61 GTGCATACAACACTGGAATTGTTGTTGATTTTTCAGAAAACTTGTTTTTTGCATCATTCT 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 83867 GTGCATACAACACTGGAATTGTTGTTGATTTTTCAGAAAACTTGTTTTTTGCATCATTCT 83926
Query 121 CCCTGCCGGTATGTGATGAGACCCTGTACTTAYGTCCACCATGC 164
|||||||||||||||||||||||||||||||| ||||||||||
Sbjct 83927 CCCTGCCGGTATGTGATGAGACCCTGTACTTACTTCCACCATGC 83970
>NC_000005.10:52989326-53094779Homosapienschromosome5_ITGA2_4core,GRCh38.p13PrimaryAssembly
Length=1671
Score = 291 bits (157), Expect = 3e-80
Identities = 161/164 (98%), Gaps = 0/164 (0%)
Strand=Plus/Plus
Query 1 TTGTCAGCAACCAAAACAAAARGTTAACATTTTCAGTAACGCTGAAAAATAAAAGGGAAA 60
||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||
Sbjct 1022 TTGTCAGCAACCAAAACAAAAGGTTAACATTTTCAGTAACGCTGAAAAATAAAAGGGAAA 1081
Query 61 GTGCATACAACACTGGAATTGTTGTTGATTTTTCAGAAAACTTGTTTTTTGCATCATTCT 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1082 GTGCATACAACACTGGAATTGTTGTTGATTTTTCAGAAAACTTGTTTTTTGCATCATTCT 1141
Query 121 CCCTGCCGGTATGTGATGAGACCCTGTACTTAYGTCCACCATGC 164
|||||||||||||||||||||||||||||||| ||||||||||
Sbjct 1142 CCCTGCCGGTATGTGATGAGACCCTGTACTTACTTCCACCATGC 1185
Lambda K H
1.42 0.646 1.21
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 109283972
Query= YW621_D08
Length=266
Score E
Sequences producing significant alignments: (Bits) Value
NC_000017.11:c44389649-44372181Homosapienschromosome17_ITGA2B_7co... 329 5e-92
NC_000017.11:c44389649-44372181Homosapienschromosome17_ITGA2B_7co... 329 5e-92
>NC_000017.11:c44389649-44372181Homosapienschromosome17_ITGA2B_7core,GRCh38.p13PrimaryAssembly
Length=17469
Score = 329 bits (178), Expect = 5e-92
Identities = 179/180 (99%), Gaps = 0/180 (0%)
Strand=Plus/Minus
Query 1 GCCTTTCTKAGGTCCCAGATCCTTTAAGGCCCATGCCCTCTGCCTCCTCACCAGCTCACG 60
|||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 9315 GCCTTTCTGAGGTCCCAGATCCTTTAAGGCCCATGCCCTCTGCCTCCTCACCAGCTCACG 9256
Query 61 GGTGTCTTGGTCTGAGGTAGGACACAGCTCTTCACAGCAGGATTCAGTGAATCTTGCACC 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 9255 GGTGTCTTGGTCTGAGGTAGGACACAGCTCTTCACAGCAGGATTCAGTGAATCTTGCACC 9196
Query 121 AGTAGCTGGACAGAGGCCTTCACCACTGGCTGAGCTCTGATGGGATAGGGTGATGGGGTA 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 9195 AGTAGCTGGACAGAGGCCTTCACCACTGGCTGAGCTCTGATGGGATAGGGTGATGGGGTA 9136
>NC_000017.11:c44389649-44372181Homosapienschromosome17_ITGA2B_7core,GRCh38.p13PrimaryAssembly
Length=1773
Score = 329 bits (178), Expect = 5e-92
Identities = 179/180 (99%), Gaps = 0/180 (0%)
Strand=Plus/Minus
Query 1 GCCTTTCTKAGGTCCCAGATCCTTTAAGGCCCATGCCCTCTGCCTCCTCACCAGCTCACG 60
|||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 632 GCCTTTCTGAGGTCCCAGATCCTTTAAGGCCCATGCCCTCTGCCTCCTCACCAGCTCACG 573
Query 61 GGTGTCTTGGTCTGAGGTAGGACACAGCTCTTCACAGCAGGATTCAGTGAATCTTGCACC 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 572 GGTGTCTTGGTCTGAGGTAGGACACAGCTCTTCACAGCAGGATTCAGTGAATCTTGCACC 513
Query 121 AGTAGCTGGACAGAGGCCTTCACCACTGGCTGAGCTCTGATGGGATAGGGTGATGGGGTA 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 512 AGTAGCTGGACAGAGGCCTTCACCACTGGCTGAGCTCTGATGGGATAGGGTGATGGGGTA 453
Lambda K H
1.36 0.630 1.15
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 101888816
Query= YW665_C09
Length=307
Score E
Sequences producing significant alignments: (Bits) Value
NC_000007.14:80602207-80679277Homosapienschromosome7_CD36,GRCh38.... 416 5e-118
NC_000007.14:80602207-80679277Homosapienschromosome7_CD36_core,GR... 416 5e-118
>NC_000007.14:80602207-80679277Homosapienschromosome7_CD36,GRCh38.p13PrimaryAssembly
Length=77071
Score = 416 bits (225), Expect = 5e-118
Identities = 228/229 (99%), Gaps = 1/229 (0%)
Strand=Plus/Plus
Query 1 TAGGTCAATCTATGCTGTATTTGAATCCGACGTTAATCTGAAAGGAATCCCTGTGTATAG 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 68768 TAGGTCAATCTATGCTGTATTTGAATCCGACGTTAATCTGAAAGGAATCCCTGTGTATAG 68827
Query 61 ATTTGTTCTTCCATCCAAGGCCTTTGCCTCTCCAGTTGAAAACCCAGACAACTATTGTTT 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 68828 ATTTGTTCTTCCATCCAAGGCCTTTGCCTCTCCAGTTGAAAACCCAGACAACTATTGTTT 68887
Query 121 CTGCACAGAAAAAATTATCTCAAAAAATTGTACATCATATGGTGTGCTAGACATCAGCAA 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 68888 CTGCACAGAAAAAATTATCTCAAAAAATTGTACATCATATGGTGTGCTAGACATCAGCAA 68947
Query 181 ATGCAAAGAAGGTGAGTAAATAACCTCAGTAGCACAG-CCATACCATAA 228
||||||||||||||||||||||||||||||||||||| |||||||||||
Sbjct 68948 ATGCAAAGAAGGTGAGTAAATAACCTCAGTAGCACAGTCCATACCATAA 68996
>NC_000007.14:80602207-80679277Homosapienschromosome7_CD36_core,GRCh38.p13PrimaryAssembly
Length=1580
Score = 416 bits (225), Expect = 5e-118
Identities = 228/229 (99%), Gaps = 1/229 (0%)
Strand=Plus/Plus
Query 1 TAGGTCAATCTATGCTGTATTTGAATCCGACGTTAATCTGAAAGGAATCCCTGTGTATAG 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 865 TAGGTCAATCTATGCTGTATTTGAATCCGACGTTAATCTGAAAGGAATCCCTGTGTATAG 924
Query 61 ATTTGTTCTTCCATCCAAGGCCTTTGCCTCTCCAGTTGAAAACCCAGACAACTATTGTTT 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 925 ATTTGTTCTTCCATCCAAGGCCTTTGCCTCTCCAGTTGAAAACCCAGACAACTATTGTTT 984
Query 121 CTGCACAGAAAAAATTATCTCAAAAAATTGTACATCATATGGTGTGCTAGACATCAGCAA 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 985 CTGCACAGAAAAAATTATCTCAAAAAATTGTACATCATATGGTGTGCTAGACATCAGCAA 1044
Query 181 ATGCAAAGAAGGTGAGTAAATAACCTCAGTAGCACAG-CCATACCATAA 228
||||||||||||||||||||||||||||||||||||| |||||||||||
Sbjct 1045 ATGCAAAGAAGGTGAGTAAATAACCTCAGTAGCACAGTCCATACCATAA 1093
Lambda K H
1.35 0.626 1.14
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 118733338
Database: refdata.fasta
Posted date: Nov 12, 2019 3:50 PM
Number of letters in database: 411,130
Number of sequences in database: 16
Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5
BLAST+中makeblastdb参数详解
以后打算工作中用到的相关BLAST操作全部用BLAST+来完成
与以前的Blast相以,我们还是从格式化数据库到比对开始
一般我们是有一个fasta文件用来格式化数据库,以前的命令是formatdb,现在是makeblastdb
一般用到的格式如下:
makeblastdb -in input_file -dbtype molecule_type -title database_title -parse_seqids -out database_name -logfile File_Name
-in 后接输入文件,你要格式化的fasta序列
-dbtype 后接序列类型,nucl为核酸,prot为蛋白
-title 给数据库起个名,好看~~(不能用在后面搜索时-db的参数)
-parse_seqids 推荐加上,现在有啥原因还没搞清楚
-out 后接数据库名,自己起一个有意义的名字,以后blast+搜索时要用到的-db的参数
-logfile 日志文件,如果没有默认输出到屏幕
和以前的formatdb差别还是挺大的,呵呵
用makeblastdb接参数-help会打印出为些信息:
makeblastdb -help
USAGE
makeblastdb [-h] [-help] [-in input_file] [-dbtype molecule_type]
[-title database_title] [-parse_seqids] [-hash_index]
[-mask_data mask_data_files] [-out database_name]
[-max_file_sz number_of_bytes] [-taxid TaxID] [-taxid_map TaxIDMapFile]
[-logfile File_Name] [-version]
DESCRIPTION
Application to create BLAST databases, version 2.2.23+
OPTIONAL ARGUMENTS
-h
Print USAGE and DESCRIPTION; ignore other arguments
-help
Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments
-version
Print version number; ignore other arguments
*** Input options
-in <File_In>
Input file/database name; the data type is automatically detected, it may
be any of the following:
FASTA file(s) and/or
BLAST database(s)
Default = `-‘
-dbtype <String, `nucl‘, `prot‘>
Molecule type of input
Default = `prot‘
*** Configuration options
-title <String>
Title for BLAST database
Default = input file name provided to -in argument
-parse_seqids
Parse Seq-ids in FASTA input
-hash_index
Create index of sequence hash values.
*** Sequence masking options
-mask_data <String>
Comma-separated list of input files containing masking data as produced by
NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)
*** Output options
-out <String>
Name of BLAST database to be created
Default = input file name provided to -in argumentRequired if multiple
file(s)/database(s) are provided as input
-max_file_sz <String>
Maximum file size for BLAST database files
Default = `1GB‘
*** Taxonomy options
-taxid <Integer, >=0>
Taxonomy ID to assign to all sequences
* Incompatible with: taxid_map
-taxid_map <File_In>
Text file mapping sequence IDs to taxonomy IDs.
Format:<SequenceId> <TaxonomyId><newline>
* Incompatible with: taxid
-logfile <File_Out>
File to which the program log should be redirected
以上是关于blast的结果的主要内容,如果未能解决你的问题,请参考以下文章