r 模拟最佳猜测基因型截断阈值,以确定它们与真实基因型的相关程度,以及它们是否会返回比较

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了r 模拟最佳猜测基因型截断阈值,以确定它们与真实基因型的相关程度,以及它们是否会返回比较相关的知识,希望对你有一定的参考价值。

#info test
#From Marchini 2010


#Simulate an allele, then simulate a penalty factor. 
#E.g. someone with 2 copies has a penalized genotype (worst case scenario) of 
#(1-threshold)*0+ 0*1 + threshold*2 = 1.6
#Someone with 1 copies has a penalized genotype of 
# threshold*0 + 0.8*1 + (1-threshold)*2 #can be either 1.2 or .8, either is .2 off from the expectation of 1.
#Someone with 0 copies has a maximum penalized genotype of .4 
# threshold*0 + 0*1 + (1-threshold)*2

threshlist=seq(.5,1,.01)
threshout <- rep(NA,length(threshlist))
ts1 <-  rep(NA,length(threshlist))
ts2 <-  rep(NA,length(threshlist))

i=1
for (threshes in threshlist)
{
thresh=threshes
nsubs=10000
truemaf=.4
truegeno <- rbinom(nsubs, 2,p=truemaf) # True genotypes


#Penalize true genotypes (penalizing randomly)
truegeno2 <- truegeno
truegeno2[which(truegeno ==2)] <- sample( c((1-thresh)/2 + thresh*2,(1-thresh) + thresh*2, thresh*2),size=length(truegeno2[which(truegeno ==2)]),replace=TRUE)
truegeno2[which(truegeno ==1)] <- sample( c(thresh, thresh + (1-thresh)*2),size=length(truegeno2[which(truegeno ==1)]),replace=TRUE)
truegeno2[which(truegeno ==0)] <- sample( c((1-thresh), (1-thresh)/2 + (1-thresh),  (1-thresh)*2),size=length(truegeno2[which(truegeno ==0)]),replace=TRUE)

newpheno <- truegeno + rnorm(nsubs,sd=5)

 ts1[i] <- summary(lm(newpheno ~ truegeno))$coefficients[2,3]

ts2[i] <-  summary(lm(newpheno ~ truegeno2))$coefficients[2,3]

 
expectation <- truegeno2
mafe <- mean(expectation)/2

r2 <- var(expectation) / (2*mafe*(1-mafe))
r2

threshout[i] <- cor.test(truegeno,truegeno2)$estimate #info scores, if that calculation is right, dont matter -- the measure is 95% identical if the threshold is .8

i=i+1
}

plot(threshlist,threshout,type='l',lwd=2,xlab="Probability Threshold",ylab="Pearson Correlation of Imputed and Genotyped")

plot(threshlist,ts1,type='l',lwd=2,xlab="Probability Threshold",ylab="Test statistic",ylim=c(0,15))
lines(threshlist,ts2,type='l',lwd=2,col="blue")

以上是关于r 模拟最佳猜测基因型截断阈值,以确定它们与真实基因型的相关程度,以及它们是否会返回比较的主要内容,如果未能解决你的问题,请参考以下文章

确定该遗传算法的基因型; GA 初始化神经网络以实现快速高效的学习

四舍五入的数字如何存储在 R 中?

最佳拟合线与R中的阈值

R语言绘制火山图(volcano plot)实战:为差异表达基因(DEGs)添加颜色基于显著性阈值进行点的颜色美化为选定基因添加标签

调节性R-loops:促进基因表达和基因组稳定性(二)

如何计算在 r 中获得第一个真实订单之前有多少次 arima 订单不真实,以用于不同的 arima 模拟组合