怎么从从ncbi的ftp上下了windows的本地blast

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了怎么从从ncbi的ftp上下了windows的本地blast相关的知识,希望对你有一定的参考价值。

参考技术A This document describes the "BLAST" databases available on the NCBI
FTP site under the /blast/db directory. The direct URL is:
ftp://ftp.ncbi.nih.gov/blast/db 本地BLAST数据库下载地址
1. General Introduction
NCBI BLAST home pages (http://www.ncbi.nih.gov/BLAST/) use a standard
set of BLAST databases for Nucleotide, Protein, and Translated BLAST
searches. These databases are made available in the /blast/db directory as
compressed archives (ftp://ftp.ncbi.nih.gov/blast/db/) in pre-formatted
format.这些数据库是已经预先进行过makeblastdb命令的,下载后可以直接使用
The FASTA databases reside under the /blast/db/FASTA directory.
The pre-formatted databases offer the following advantages:
* The pre-formatted databases are smaller in size and therefore are
faster to download;
* Sequences in FASTA format can be generated from the pre-formatted
databases by the fastacmd utility; 可以从这些数据库文件中导出FASTA文件
* A convenient script (update_blastdb.pl) is available to download
the pre-formatted databases from the NCBI ftp site; 可用该脚本升级数据库
* Pre-formatting removes the need to run formatdb; 无需再运行建库命令行
* Taxonomy ids are available for each database entry.
Pre-formatted databases must be downloaded using the update_blastdb.pl
script or via FTP in binary mode. Documentation for the update_blastdb.pl
script can be obtained by running the script without any arguments (perl is
required). 下载数据库时,需要用到perl脚本update_blastdb.pl,或使用FTP下载工具
The compressed files downloaded must be inflated with gzip or other decompress
tools. The BLAST database files can then be extracted out of the resulting
tar file using tar program on Unix/Linux or WinZip and StuffIt Expander
on Windows and Macintosh platforms, respectively.下载的数据库为压缩包,要解压缩
Large databases are formatted in multiple 1 Gigabytes volumes, which
are named using the database.##.tar.gz convention. All relevant volumes
are required. An alias file is provided so that the database can be called
using the alias name without the extension (.nal or .pal). For example,
to call est database, simply use "-d est" option in the commandline
(without the quotes). 大的数据库通常分为多个压缩包,例如nr库有11个压缩包。所有的相关压缩包
都要下载,解压。解压缩会生成对应的库文件,同时生成一个nr.pal文件。检索nr库时输入-d nr 即可。
Certain databases are subsets of a larger parental database. For those
databases, alias and mask files, rather than actual databases, are provided.
The mask file needs the parent database to function properly. The parent
databases should be generated on the same day as the mask file. For
example, to use swissprot pre-formatted database, swissprot.tar.gz, one
will need to get the nr.tar.gz with the same date stamp. 有些数据库是大数据库
的子集,使用这些子集数据库时,必须同时下载其(相同日期的)大数据库
Additional BLAST databases that are not provided in pre-formatted
formats are available in the FASTA subdirectory. 有些BLAST数据库没有提供预先建库
的文件,这些数据库可以从FASTA文件夹里下载 For genomic BLAST
databases, please check the genomes ftp directory at:
ftp://ftp.ncbi.nih.gov/genomes/ 在这里下载基因组BLAST数据库

2. Contents of the /blast/db/ directory
The pre-formatted BLAST databases are archived in this directory. The
name of these databases and their contents are listed below.
数据库名称 数据库内容
+----------------------+-----------------------------------------------+
|File Name | Content Description |
+----------------------+-----------------------------------------------+
/FASTA | subdirectory for FASTA formatted sequences
存放FASTA格式序列的子文件夹

README | README for this subdirectory (this file)
env_nr.*tar.gz | Environmental protein sequences 环境蛋白序列
env_nt.*tar.gz | Environmental nucleotide sequences 环境核苷酸序列
est.*tar.gz | volumes of the formatted est database
| from the EST division of GenBank, EMBL,
| and DDBJ. EST数据库本回答被提问者和网友采纳

sh 从NCBI FTP获取COG

# First we will download the necessary files
wget ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data/cog2003-2014.csv
wget ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data/prot2003-2014.fa.gz

# Then we will extract Patrick's COG sequences using filterbyname.sh from the BBMAP package
for COG in COG0085 COG0468 COG0187 COG0226 COG0248 COG0855 COG2326 COG1702; do LC_ALL=C grep -F -f <(grep ${COG} cog2003-2014.csv | cut -f1 -d ',') -w prot2003-2014.ids | tr -d '>' > ${COG}.ids; ./bbmap/filterbyname.sh -in=prot2003-2014.fa.gz out=${COG}.fasta.gz include=true names=${COG}.ids; done

# Create LAMBDA indexes
for i in COG*; do ~/opt/lambda-v0.9.3/bin/lambda_indexer -p blastx -d ${i} ; done

# Run for each OSD sample on SGE
qsub OSD1_2014-06-21_0m_NPL022_lambda_cogs.sh

# Concatenate all results
for N in COG0085 COG0468 COG0187 COG0226 COG0248 COG0855 COG2326 COG1702; do find results/ -name "*${N}*" | while read LINE; do NAM=$(basename ${LINE} _${N}.blastx.m8.gz); zcat ${LINE} | awk -vL=$N"\t"${NAM} '{print L"\t"$0}' ; done; done |gzip -c > COG_patrick_results.txt.gz

# Get best hit
sort -S50% --parallel=8 -k3,3V -k13,13g <(zcat COG_patrick_results.txt.gz) | awk '!a[$3]++' > COG_patrick_results.bh.txt
#!/bin/bash
set -x
set -e
set -o pipefail
set -o errexit
set -o errtrace
set -o nounset

# Run uproc and extract contigs

declare -r NAM="OSD1_2014-06-21_0m_NPL022"
declare -r NSLOTS="${NSLOTS}"
declare -r PATHB="/home/mpi45770/opt/lambda-v0.9.3/bin/lambda"
declare INPUT="/home/mpi45770/COG_patrick/input"
declare RESULTS="/home/mpi45770/COG_patrick/results"

declare ME="${INPUT}"/"${NAM}"_ME_shotgun_workable_merged.fastq.gz
declare SE="${INPUT}"/"${NAM}"_SE_shotgun_workable_merged.fastq.gz

declare -a COGS=(COG0085 COG0468 COG0187 COG0226 COG0248 COG0855 COG2326 COG1702)

for COG in "${COGS[@]}"; do
    declare DB=/home/mpi45770/biodb/lambda/"${COG}".fasta.gz

    MERES="${RESULTS}"/"${NAM}"_"${COG}"_ME.blastx.m8
    SERES="${RESULTS}"/"${NAM}"_"${COG}"_SE.blastx.m8

    "${PATHB}" -e 1e-5 -so 5 -p blastx -nm 10  -q "${ME}" -d "${DB}" -o "${MERES}" -t "${NSLOTS}"
    "${PATHB}" -e 1e-5 -so 5 -p blastx -nm 10  -q "${SE}" -d "${DB}" -o "${SERES}" -t "${NSLOTS}"

    cat "${MERES}" "${SERES}" | gzip -1c > "${RESULTS}"/"${NAM}"_"${COG}".blastx.m8.gz
    rm "${MERES}" "${SERES}"
done

以上是关于怎么从从ncbi的ftp上下了windows的本地blast的主要内容,如果未能解决你的问题,请参考以下文章

如何从NCBI下载基因组数据

转录组数据库介绍

写个小程序从FTP上下载文件

windows系统下怎么使用sratoolkit下载sra数据。

sh 从NCBI FTP获取COG

本地blast