limma包
Posted djx571
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了limma包相关的知识,希望对你有一定的参考价值。
1)Introduction
DEXSeq是一种在多个比较RNA-seq实验中,检验差异外显子使用情况的方法。 通过差异外显子使用(DEU),我们指的是由实验条件引起的外显子相对使用的变化。 外显子的相对使用定义为:
number of transcripts from the gene that contain this exon / number of all transcripts from the gene
大致思想:. For each exon (or part of an exon) and each sample, we count how many reads map to this exon and how many reads map to any of the other exons of the same gene. We consider the ratio of these two counts, and how it changes across conditions, to infer changes in the relative exon usage
2)安装
if("DEXSeq" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("DEXSeq")} suppressMessages(library(DEXSeq)) ls(‘package:DEXSeq‘) pythonScriptsDir = system.file( "python_scripts", package="DEXSeq" ) list.files(pythonScriptsDir) ## [1] "dexseq_count.py" "dexseq_prepare_annotation.py" #查看是否含有这两个脚本 python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.72.gtf Dmel.BDGP5.25.62.DEXSeq.chr.gff #GTF转化为GFF with collapsed exon counting bins. python dexseq_count.py Dmel.BDGP5.25.62.DEXSeq.chr.gff untreated1.sam untreated1fb.txt #count
3) 用自带实验数据集(数据预处理)
suppressMessages(library(pasilla))
inDir = system.file("extdata", package="pasilla")
countFiles = list.files(inDir, pattern="fb.txt$", full.names=TRUE) #countfile(如果不是自带数据集,可以由dexseq_count.py脚本生成)
basename(countFiles)
flattenedFile = list.files(inDir, pattern="gff$", full.names=TRUE)
basename(flattenedFile) #gff文件(如果不是自带数据集,可以由dexseq_prepare_annotation.py脚本生成)
########构造数据框sampleTable,包含sample名字,实验,文库类型等信息#######################
sampleTable = data.frame(
row.names = c( "treated1", "treated2", "treated3",
"untreated1", "untreated2", "untreated3", "untreated4" ),
condition = c("knockdown", "knockdown", "knockdown",
"control", "control", "control", "control" ),
libType = c( "single-end", "paired-end", "paired-end",
"single-end", "single-end", "paired-end", "paired-end" ) )
sampleTable
##############构建 DEXSeqDataSet object#############################
dxd = DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile ) #四个参数
4)Standard analysis work-?ow
genesForSubset = read.table(file.path(inDir, "geneIDsinsubset.txt"),stringsAsFactors=FALSE)[[1]] #基因子集ID dxd = dxd[geneIDs( dxd ) %in% genesForSubset,] #取子集,减少运行量
以上是关于limma包的主要内容,如果未能解决你的问题,请参考以下文章