r GHSG_HD1215 - 雷根斯堡数据

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了r GHSG_HD1215 - 雷根斯堡数据相关的知识,希望对你有一定的参考价值。

# This code is mostly copied from the Bachelorthesis of Gunther Glehr


load.gshgHD1215data <- function(ghsgHD1215.datadirectory){
	
	################################################################################
	# load bigdata (.rdat and .RData) given by 
	# Christian Kohler in his mail from 1. June 2016 9:37
	# Christian Kohler in his mail from 30. May 2016 16:18
	# on my mail-account gunther.glehr@stud.uni-regensburg.de
	# USES:
	# 	data/bigdata/dataRawCombinedFull.rdat		- "df.dataRawCombFull": counts for 156 genes and 425 samples
	# 	data/bigdata/dataRegensburg_2015-07-13.RData- "dataRegensburg": 	phenodata for 404 samples
	#
	# Result:
	#	pheno.rgb	- phenodata for 404 samples from Regensburg
	#	expr.rgb	- Counts for 404 Samples from Regensburg
	#
	#
	################################################################################
	
	###### loading rdata ######
	load(file.path(ghsgHD1215.datadirectory, "dataRawCombinedFull.rdat"))
	untar(file.path(ghsgHD1215.datadirectory, "cHL-survival_2016-06-01.tar.gz")
		  , exdir = ghsgHD1215.datadirectory)
	load(file.path(ghsgHD1215.datadirectory, "dataRegensburg_2015-07-13.RData"))
	data.ghsgHD1215 <- dataRegensburg
	
	
	###### Assign right Sample names (why are they wrong?) #######
	# df.dataRawCombFull samplenames(colnames):
	# 	      -  -    -Patienten_ID- - 
	# 	B40566-05-HD15-16639-G-Cartridge36
	
	#starting from 4 because the first 3 columns are:
	# 	"Code.Class"                        
	# 	"Name"                                     
	# 	"Accession"
	for(sampleN in 4:dim(df.dataRawCombFull)[2]){
		split.cn <- strsplit(colnames(df.dataRawCombFull)[sampleN], "-")[[1]]
		if(is.na(as.numeric(split.cn[length(split.cn) - 2]))){
			colnames(df.dataRawCombFull)[sampleN] <- split.cn[length(split.cn) - 1]
		}else{
			colnames(df.dataRawCombFull)[sampleN] <- split.cn[length(split.cn) - 2]
		}
	}
	catt("NA-warnings are intended!\n")
	
	
	###### remove unnecessary info  ######
	# 	control-samples 
	# 	codeclass
	# 	name
	# 	accession-info
	# 	referenzsamples
	df.dataRawCombFull <- df.dataRawCombFull[, ! is.na(as.numeric(colnames(df.dataRawCombFull)))]
	catt("NA-warning is intended!\n")
	
	###### remove measured samples which are not in phenodata #####
	catt("Removing samples from df.dataRawCombFull (no phenodata available)\n")
	catt("   ", colnames(df.dataRawCombFull)[! colnames(df.dataRawCombFull) %in% dataRegensburg$Patienten_ID], "\n")
	df.dataRawCombFull <- df.dataRawCombFull[, colnames(df.dataRawCombFull) %in% dataRegensburg$Patienten_ID]
	
	##### sort data frames ######
	df.dataRawCombFull <-df.dataRawCombFull[, order(colnames(df.dataRawCombFull))]
	df.dataRawCombFull <-df.dataRawCombFull[order(rownames(df.dataRawCombFull)), ]
	
	data.ghsgHD1215 <- data.ghsgHD1215[order(data.ghsgHD1215$Patienten_ID), ]
	
	pheno.ghsgHD1215 <- data.ghsgHD1215
	expr.ghsgHD1215 <- as.matrix(df.dataRawCombFull)
	
	return(list("expr.ghsgHD1215"=expr.ghsgHD1215, "pheno.ghsgHD1215"=pheno.ghsgHD1215))
}

以上是关于r GHSG_HD1215 - 雷根斯堡数据的主要内容,如果未能解决你的问题,请参考以下文章

随笔1215 docker r+python

阿夫雷根

BFS简单题套路_Codevs 1215 迷宫

R - 根据模式重新排列数据[重复]

bzoj1215

加快R中大数据的for循环处理时间