r GHSG_HD1215 - 雷根斯堡数据
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了r GHSG_HD1215 - 雷根斯堡数据相关的知识,希望对你有一定的参考价值。
# This code is mostly copied from the Bachelorthesis of Gunther Glehr
load.gshgHD1215data <- function(ghsgHD1215.datadirectory){
################################################################################
# load bigdata (.rdat and .RData) given by
# Christian Kohler in his mail from 1. June 2016 9:37
# Christian Kohler in his mail from 30. May 2016 16:18
# on my mail-account gunther.glehr@stud.uni-regensburg.de
# USES:
# data/bigdata/dataRawCombinedFull.rdat - "df.dataRawCombFull": counts for 156 genes and 425 samples
# data/bigdata/dataRegensburg_2015-07-13.RData- "dataRegensburg": phenodata for 404 samples
#
# Result:
# pheno.rgb - phenodata for 404 samples from Regensburg
# expr.rgb - Counts for 404 Samples from Regensburg
#
#
################################################################################
###### loading rdata ######
load(file.path(ghsgHD1215.datadirectory, "dataRawCombinedFull.rdat"))
untar(file.path(ghsgHD1215.datadirectory, "cHL-survival_2016-06-01.tar.gz")
, exdir = ghsgHD1215.datadirectory)
load(file.path(ghsgHD1215.datadirectory, "dataRegensburg_2015-07-13.RData"))
data.ghsgHD1215 <- dataRegensburg
###### Assign right Sample names (why are they wrong?) #######
# df.dataRawCombFull samplenames(colnames):
# - - -Patienten_ID- -
# B40566-05-HD15-16639-G-Cartridge36
#starting from 4 because the first 3 columns are:
# "Code.Class"
# "Name"
# "Accession"
for(sampleN in 4:dim(df.dataRawCombFull)[2]){
split.cn <- strsplit(colnames(df.dataRawCombFull)[sampleN], "-")[[1]]
if(is.na(as.numeric(split.cn[length(split.cn) - 2]))){
colnames(df.dataRawCombFull)[sampleN] <- split.cn[length(split.cn) - 1]
}else{
colnames(df.dataRawCombFull)[sampleN] <- split.cn[length(split.cn) - 2]
}
}
catt("NA-warnings are intended!\n")
###### remove unnecessary info ######
# control-samples
# codeclass
# name
# accession-info
# referenzsamples
df.dataRawCombFull <- df.dataRawCombFull[, ! is.na(as.numeric(colnames(df.dataRawCombFull)))]
catt("NA-warning is intended!\n")
###### remove measured samples which are not in phenodata #####
catt("Removing samples from df.dataRawCombFull (no phenodata available)\n")
catt(" ", colnames(df.dataRawCombFull)[! colnames(df.dataRawCombFull) %in% dataRegensburg$Patienten_ID], "\n")
df.dataRawCombFull <- df.dataRawCombFull[, colnames(df.dataRawCombFull) %in% dataRegensburg$Patienten_ID]
##### sort data frames ######
df.dataRawCombFull <-df.dataRawCombFull[, order(colnames(df.dataRawCombFull))]
df.dataRawCombFull <-df.dataRawCombFull[order(rownames(df.dataRawCombFull)), ]
data.ghsgHD1215 <- data.ghsgHD1215[order(data.ghsgHD1215$Patienten_ID), ]
pheno.ghsgHD1215 <- data.ghsgHD1215
expr.ghsgHD1215 <- as.matrix(df.dataRawCombFull)
return(list("expr.ghsgHD1215"=expr.ghsgHD1215, "pheno.ghsgHD1215"=pheno.ghsgHD1215))
}
以上是关于r GHSG_HD1215 - 雷根斯堡数据的主要内容,如果未能解决你的问题,请参考以下文章