sRNAnalyzer

Posted Gene2You

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sRNAnalyzer相关的知识,希望对你有一定的参考价值。


sRNAnalyzer--A pipeline for small RNA sequencing data analysis

Getting Started with sRNAnalyzer

Introduction

sRNAnalyzer is a flexible, modular pipeline for the analysis of small RNA sequencing data. Features include,

  1. Additional adapter trimming process to generate cleaner data

  2. Comprehensive microRNA profiling strategies to better handle isomiR issues

  3. Summarization for each nucleotide to detect potential SNPs on miRNAs

  4. Multiple assignment to simulate miRNA array and qRT-PCR platforms

  5. A local probabilistic model to map reads to the most-likely entry IDs

  6. Comprehensive ribosomal RNA filtering to get more accurate mapping results

  7. More specific species assignment on exogenous RNAs in circulation

  8. Taxonomy annotation/summarization at major ranks for exogenous species

Downloads

Download the sRNAnalyzer pipeline and alignment databases as .tar.gz files.

  • sRNAnalyzer Pipeline: sRNAnalyzer

  • small RNA Databases: sRNA_DBs

  • Human and Exogenous Databases: MainDBs

  • NCBI Non-Human Databases: NCBI_NonHuman

  • E.Coli Samples for Pipeline Evaluation: E_coli_evaluation_samples

Web Application

To demonstrate how to use sRNAnalyzer, we built a web interface in a local server, which can be accessed at Web Application page for uploading data and running sRNAnalyzer pipeline online. For testing purpose, Username – ‘isb’ and Password – ‘password’ can be used. For academic use, free accounts will be provided based on email applications. Please send emails to Dr. Kai Wang.

Getting Started

To get started using sRNAnalyzer, download the sRNAnalyzer as a .zip or a .tar.gz file as well as the sRNA databases required. Then head to the Getting Started page for instruction on setting up and running sRNAnalyzer.

Documentation

For more information on options and configuration, head to the Documentation. For documentation on the format of the output and summary files, go to the Output file documentation.

1. Download and Install Dependencies

  • Make sure you have python 2.6 or later and perl 5 or later installed.

  • Download and install bowtie. Make sure the bowtie command is in your system path. Note: bowtie 2 is not supported in sRNAnalyzer.

  • Download and install the fastx_toolkit, following the instructions on the website. Download the fastx 0.0.14 version. Make sure the command fastx_collapser is in your system path.

  • Download and install cutadapt. This requires python 2.6 or later and a C compiler. The easiest way to install cutadapt is using pip following the instructions on the cutadapt website. Make sure the cutadapt command is in your system path.



2. Download and Setup sRNAnalyzer

  • Download sRNAnalyzer. Unzip the downloaded archive. You may want to add the sRNAnalzyer directory to your system PATH so you can use the sRNAnalyzer commands directly. Next, we need to download some databases for alignment. There are three options for databases to download: a small RNA database, a database with human DNA and RNA, as well as some bacterial sequences, and the NCBI non-human database. The latter two databases are quite large (> 70GB uncompressed), so it is recommended to begin with the sRNA database. The installation procedure for all three databases is the same. First, download one of the databases and unzip the archive. Open the DB_config file and change the line

base: Insert the path to this folder here

by inserting the full path to the folder. For example,

base: /databases/bowtie/indexes/sRNA_DBs

Looking at the DB_config, you should see a list of database names with paths. These databases are the ones that you can use in your pipeline now. It is also possible to add many new databases to the pipeline by downloading or building bowtie indexes and specifying their location in the database configuration file. For more information, see the Configuration File Documentation Now you're ready to begin using the pipeline.


3. Using the Pipeline

In order to use the pipeline, we need to create a pipeline configuration file, which specifies preprocessing setting, such as adapter sequences, and alignment settings such as database order and maximum mismatch allowances. Go to the Config Docs to learn how to create a configuration file with the settings required for your project.

An typical pipeline configuration file is shown below,

preprocess:
kit: NEB
gzip: true
stop-oligo: false

alignment:
type: single
human_miRNA: 2
human_miRNA_sub: 2
human_piRNA: 2
human_snoRNA: 2

3.1 Preprocessing

Using a terminal, change the directory so that the fastq or fast.gz files you wish to process are in the current working directory. In order to run preprocessing, run the command

/Downloads/sRNAnalyzer/preprocess.pl --config pipeline_config.conf

where pipeline_config.yaml is your pipeline configuration file, and /Downloads/ is replaced with wherever your sRNAnalyzer folder is located. Or if you have added the sRNAnalyzer directory to the system PATH, then simply use

preprocess.pl --config pipeline_config.conf

The preprocessing will generated sample_Processed.fa files that have had adapter trimmed, low-quality reads filtered out, and collapsed. Additional report files are also generated with information about adapter trimming and read quality.

3.2 Alignment

To perform the alignment, ensure that your database and pipeline configuration files are properly setup. After downloading the initial human small RNA databases, the databases available for alignment, which can be specified in the pipeline configuration file are,

human_miRNA
human_miRNA_sub
human_piRNA
human_snoRNA
virus_miRNA
plant_miRNA
all_miRNA
all_miRNA_sub

Then, making sure that you are in the directory containing the _Processed.fa files you wish to align, run the command

/Downloads/sRNAnalyzer/align.pl /home/data pipeline_config.yaml DB_config.conf

or

align.pl /home/data pipeline_config.yaml DB_config.conf

if you have added sRNAnalyzer to the system PATH

In the command, pipeline_config.yaml is the pipeline configuration file and DB_config.conf is the database configuration file.

The align command will output several files, including feature files, profile files, a read distribution file, and an unmatched sequences file.

3.3 Summarization The next step in the pipeline is the summarization of the results of the alignment in order to prepare for statistical analysis of the data. An example summarization command is,

summarize.pl DB_config.conf --project my_project

This command will sum the feature and profile result from individual samples into result files for all samples. my_project is the name of the project, so all of the result files with start with the prefix my_project_. The general form of the summarize command is,

summarize.pl <db-config-file> <sample-order-file> --project <project-name>

where the db-config-file is required, and the sample-order-file and project-name are both optional. The db-config-file is the database configuration file discussed above, and the sample-order-file specifies the order of the samples in the result files. If the sample order file is not provided, the order is alphabetical. The summarize.pl command has two additional options, --miRNA and --exogenous. Use the --miRNA flag if you would like to summarize miRNA separately and get information about possible miRNA SNPs. Use the --exogenous flag if you would like to summarize exogenous reads, including summarizing by taxonomy information. Note that the --exogenous option is only available if the MainDBs or NCBI_NonHuman databases are installed.


以上是关于sRNAnalyzer的主要内容,如果未能解决你的问题,请参考以下文章

VSCode自定义代码片段——CSS选择器

谷歌浏览器调试jsp 引入代码片段,如何调试代码片段中的js

片段和活动之间的核心区别是啥?哪些代码可以写成片段?

VSCode自定义代码片段——.vue文件的模板

VSCode自定义代码片段6——CSS选择器

VSCode自定义代码片段——声明函数