python 计算图案在区域序列中出现的次数

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 计算图案在区域序列中出现的次数相关的知识,希望对你有一定的参考价值。

#Our objective is to extract the average DNA methylation (beta value) in all regulatory elements defined in the BLUEPRINT regulatory build (modified from~\cite{Zerbino2015}) across all BLUEPRINT samples. After downloading these data from DeepBlue we use the package gplots to create a heatmap showing the most variable regions (rows) across samples (columns). Moreover, we cluster samples by their pairwise Spearman correlation coefficients (Figure~\ref{Fig:1}). 
#In Listing 1 we demonstrate how DeepBlueR can be used to quickly and efficiently aggregate a large data set on the DeepBlue server in preparation for downstream analysis in R. We omit the R code for generating the heatmap here. It can be found in the supplemental material. To retrieve the required data, we first select all bisulfite sequencing experiments from the BLUEPRINT project (lines $5$-$9$). Our selection comprises $206$ data files and $47$ distinct cell types after filtering for the correct file type (lines $9$-$11$). For each file, we select the column that contains the beta value (line $12$). Next, we select the regions of the regulatory build (lines $14$-$16$), which we use in the next command (lines $18$-$21$) to request a data matrix in which the average beta values are computed for each regulatory element and experiment. Finally, we download the resulting data matrix in lines $23$-$24$.\\

import xmlrpclib
import time

url = "http://deepblue.mpi-inf.mpg.de/xmlrpc"
user_key = "anonymous_key"

server = xmlrpclib.Server(url, allow_none=True)

(status, query_id) = server.select_experiments ( "DG-75_c01.ERX297417.H3K27ac.bwa.GRCh38.20150527.bed", None, None, None, user_key )
fmt = "CHROMOSOME,START,END,@LENGTH,@COUNT.MOTIF(C),@COUNT.MOTIF(G),@COUNT.MOTIF(GC),@SEQUENCE"
(status, request_id) = server.get_regions(query_id, fmt, user_key)

(status, info) = server.info(request_id, user_key)
request_status = info[0]["state"]
while request_status != "done" and request_status != "failed":
  time.sleep(1)
  (status, info) = server.info(request_id, user_key)
  request_status = info[0]["state"]

(status, regions) = server.get_request_data(request_id, user_key)

以上是关于python 计算图案在区域序列中出现的次数的主要内容,如果未能解决你的问题,请参考以下文章

如何在excel中计算一个数字出现的次数

Python练习题3.6求整数序列中出现次数最多的数-修正版

python学习--统计序列中元素出现的频度(次数)

python 序列中出现次数超过N / 3的元素

EXCEL中统计某个区域内多个数字一共出现的次数

Python实用黑科技——找出序列里面出现次数最多的元素