#Our objective is to extract the average DNA methylation (beta value) in all regulatory elements defined in the BLUEPRINT regulatory build (modified from~\cite{Zerbino2015}) across all BLUEPRINT samples. After downloading these data from DeepBlue we use the package gplots to create a heatmap showing the most variable regions (rows) across samples (columns). Moreover, we cluster samples by their pairwise Spearman correlation coefficients (Figure~\ref{Fig:1}).
#In Listing 1 we demonstrate how DeepBlueR can be used to quickly and efficiently aggregate a large data set on the DeepBlue server in preparation for downstream analysis in R. We omit the R code for generating the heatmap here. It can be found in the supplemental material. To retrieve the required data, we first select all bisulfite sequencing experiments from the BLUEPRINT project (lines $5$-$9$). Our selection comprises $206$ data files and $47$ distinct cell types after filtering for the correct file type (lines $9$-$11$). For each file, we select the column that contains the beta value (line $12$). Next, we select the regions of the regulatory build (lines $14$-$16$), which we use in the next command (lines $18$-$21$) to request a data matrix in which the average beta values are computed for each regulatory element and experiment. Finally, we download the resulting data matrix in lines $23$-$24$.\\
import xmlrpclib
import time
url = "http://deepblue.mpi-inf.mpg.de/xmlrpc"
user_key = "anonymous_key"
server = xmlrpclib.Server(url, allow_none=True)
(status, query_id) = server.select_experiments ( "DG-75_c01.ERX297417.H3K27ac.bwa.GRCh38.20150527.bed", None, None, None, user_key )
fmt = "CHROMOSOME,START,END,@LENGTH,@COUNT.MOTIF(C),@COUNT.MOTIF(G),@COUNT.MOTIF(GC),@SEQUENCE"
(status, request_id) = server.get_regions(query_id, fmt, user_key)
(status, info) = server.info(request_id, user_key)
request_status = info[0]["state"]
while request_status != "done" and request_status != "failed":
time.sleep(1)
(status, info) = server.info(request_id, user_key)
request_status = info[0]["state"]
(status, regions) = server.get_request_data(request_id, user_key)