meme suite —— Motif分析百宝箱(二)

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了meme suite —— Motif分析百宝箱(二)相关的知识,希望对你有一定的参考价值。

参考技术A Motif Discovery中还包括MEME-ChIP,可对ChIP-seq或CLIP-seq数据的DNA序列进行一系列的Motif分析。该方法整合了:
1、MEME&STREME功能,用以预测Motif(de novo Motif);
2、CentriMo用以寻找输入序列中间区段内已知的Motif,适合ChIP-Seq数据检出峰所在序列上富集的已知 Motif;
3、通过工具Tomtom功能比较已知的motif进行相似性分析,并对重要的motif进行分组;
4、Spamo与CentriMo功能相似,也是对Motif进行富集;
5、FIMO功能旨在输出Motif在基因的位置信息。

MEME-ChIP具体参数可以使用默认值,或按需求更改。示意图如下:

我们以网站Sample Output中MEME-ChIP example结果为例进行说明。

r 导入荷马ChIP-Seq Motif数据

# Daniel Cook 2014
# Danielecook.com
#
# Use this function to import ChIP Seq Data generated by Homer. This data is generated using homers findMotifsGenome.pl
# command with the '-find <motif file>' argument. Generate output
# looks like this:
#
# 1. Peak/Region ID
# 2. Chromosome
# 3. Start
# 4. End
# 5. Strand of Peaks
# 6-18: annotation information
# 19. CpG%
# 20. GC%
# 21. Motif Instances <distance from center of region>(<sequence>,<strand>,<conservation>)
#
# Getting the information from column 21 for plotting, for example, frequency can be tricky. This funciton aims to help that
# by converting the data format from wide to long. Currently, annotation columns are ignored but you can modify the function to
# fix.

library(stringr)
library(splitstackshape)
library(reshape2)
library(dplyr)
suppressMessages(library(data.table))

import_peak_file <- function(filename, peak_type) {
  df <- fread(filename)
  setnames(df, names(df)[1],"Peak")
  setnames(df, names(df)[9:length(names(df))] , str_extract(names(df)[9:length(names(df))], "[A-Za-z0-9]+"))
  motifs <- names(df)[9:length(names(df))]
  # Reshape latter columns

  # Remove extra columns if present
  df$Annotation <- NULL
  df$"Focus Ratio/Region Size" <- NULL
  df$Detailed <- NULL
  df$Distance <- NULL
  df$Nearest <- NULL
  df$Entrez <- NULL
  df$Nearest <- NULL
  df$Nearest <- NULL
  df$Nearest <- NULL
  df$Gene <- NULL
  df$Gene <- NULL
  df$Gene <- NULL
  df$Gene <- NULL
  
  # Melt Again
  df <- melt(df, id.vars = 1:8)
  df$value <- str_replace_all(df$value, "),", "|")
  df <- filter(df, value != "")
  df <- concat.split(as.data.frame(df), split.col = c("value"), sep="|")
  df$value <- NULL
  
  df <- rename(df, c("variable"="motif_name"))
  df <- melt(df, id.vars = 1:9)
  df$value_1 <- NULL
  df$variable <- NULL
  df <- filter(df, value != "")
  
  df$MOTIF <- str_extract(df$value,"[ATCG]+")
  df$POS <- as.integer(str_extract(df$value,"[-0-9]+"))
  df$STRAND <- str_extract(df$value,"(,[+|-],)")
  df$STRAND <- str_replace_all(df$STRAND,",","")
  df$value <- NULL
  df <- filter(df, !is.na(POS))
  df$peak_type <- peak_type
  df
}

以上是关于meme suite —— Motif分析百宝箱(二)的主要内容,如果未能解决你的问题,请参考以下文章

MEME(Motif-based sequence analysis tools)使用说明

fimo: 扫描motif时需留心的小坑

ChIP-Seq数据挖掘系列-2: Motif 分析(2) - HOMER Motif 分析基本步骤

motif分析软件-----Homer的安装

ChIP-Seq数据挖掘系列-6: 怎么选择HOMMER结果中的motif

Homer预测共表达基因的motif