JAGS/rjags 中多个组的单独贝叶斯参数估计

Posted

技术标签:

【中文标题】JAGS/rjags 中多个组的单独贝叶斯参数估计【英文标题】:Separate Bayesian parameter estimates for multiple groups in JAGS/rjags 【发布时间】:2017-04-24 11:19:40 【问题描述】:

我正在尝试在 JAGS 中执行分层分析,从 Kruschke 的《做贝叶斯数据分析》第 9 章推断。我希望获得四个硬币正面比例的后验参数估计值(theta 的 1、2、3 和 4) ,来自两个铸币厂,以及来自每个铸币厂的硬币平均偏差的估计值(铸币厂偏差:欧米茄)。我将每个铸币厂的偏差 kappa 的可变性保持为常数。问题是我无法从第二次造币厂得到后验估计,它似乎只是在对先验进行抽样。有谁知道如何修复模型字符串文本(参见下面的第 3 步)以便为第二次造币厂生成后验估计?

下面分析的整个脚本

library(rjags)
library(runjags)
library(coda)


############### 1. Generate the data 

flips <- c(sample(c(rep(1,3), rep(0,9))), # coin 1, mint 1, 12 flips total
           sample(c(rep(1,1), rep(0,4))), # coin 2, mint 1, 5 flips total
           sample(c(rep(1,10), rep(0,5))), # coin 1, mint 2, 15 flips
           sample(c(rep(1,17), rep(0,6)))) # coin 2, mint 2, 23 flips

coins <- factor(c(rep(1,12), rep(2,5), rep(3, 15), rep(4, 23)))

mints <- factor(c(rep(1,17), rep(2,38)))

nFlips <- length(flips) 
nCoins <- length(unique(coins))
nMints <- length(unique(mints))


#################### 2. Pass data into a list 

dataList <- list(
  flips = flips,
  coins = coins,
  mints = mints,
  nFlips = nFlips,
  nCoins = nCoins,
  nMints = nMints)


################### 3. specify and save the model 

modelString <- "
model

      # start with nested likelihood function
      for (i in 1:nFlips) 

              flips[i] ~ dbern(theta[coins[i]])
       

      # next the prior on theta
      for (coins in 1:nCoins) 

              theta[coins] ~ dbeta(omega[mints[coins]]*(kappa - 2) + 1, (1 - omega[mints[coins]])*(kappa - 2) + 1) 
      

      # next we specify the prior for the higher-level parameters on the mint, omega and kappa
      for (mints in 1:nMints) 

              omega[mints] ~ dbeta(2,2)

      

      kappa <- 5

"


writeLines(modelString, "tempModelHier4CoinTwoMint.txt")

############################### Step 4: Initialise Chains 

initsList <- list(theta1 = mean(flips[coins==1]),
                  theta2 = mean(flips[coins==2]),
                  theta3 = mean(flips[coins==3]),
                  theta4 = mean(flips[coins==4]),
                  omega1 = mean(c(mean(flips[coins==1]),
                                 mean(flips[coins==2]))),
                  omega2 = mean(c(mean(flips[coins==3]),
                                 mean(flips[coins==4]))))

initsList


############################### Step 5: Generate Chains 

runJagsOut <- run.jags(method = "simple",
                       model = "tempModelHier4CoinTwoMint.txt",
                       monitor = c("theta[1]", "theta[2]", "theta[3]", "theta[4]", "omega[1]", "omega[2]"),
                       data = dataList,
                       inits = initsList,
                       n.chains = 1,
                       adapt = 500,
                       burnin = 1000,
                       sample = 50000,
                       thin = 1,
                       summarise = FALSE,
                       plots = FALSE)



############################### Step 6: Convert to Coda Object 

codaSamples <- as.mcmc.list(runJagsOut)

head(codaSamples)


############################### Step 7: Make Graphs 

df <- data.frame(as.matrix(codaSamples))

theta1 <- ggplot(df, aes(x = df$theta.1.)) + geom_density()
theta2 <- ggplot(df, aes(x = df$theta.2.)) + geom_density()
theta3 <- ggplot(df, aes(x = df$theta.3.)) + geom_density()
theta4 <- ggplot(df, aes(x = df$theta.4.)) + geom_density()
omega1 <- ggplot(df, aes(x = df$omega.1.)) + geom_density()
omega2 <- ggplot(df, aes(x = df$omega.2.)) + geom_density()

require(gridExtra)

ggsave("coinsAndMintsHier/hierPropFourCoinsTwoMints.pdf", grid.arrange(theta1, theta2, theta3, theta4, omega1, omega2, ncol = 2), device = "pdf", height = 30, width = 10, units = "cm")

【问题讨论】:

我现在没有时间检查,但我想知道是否使用 coins 两次,一次作为数据,一次作为第二个 for 循环的索引,可能会导致一些麻烦? 谢谢@Jacob Socolar,但coins 不是数据,它是索引硬币编号的因素。 flips 是数据。 您的 dataList 定义了一个名为 coins 的变量作为数据。 正如 Jacob Socolar 所说,脚本使用 coins 作为 for 循环索引和数据向量。模型规范中的注意事项:for (coins in 1:nCoins)。将循环索引更改为不同的名称,例如 cIdx 在它出现的任何地方。不知道这是否能解决问题,但不会有什么坏处。 感谢@John K. Kruschke 的建议。不幸的是,它没有任何效果。应该是第一个铸币厂的后验是基于所有四个硬币而不是前两个硬币的估计,第二个铸币厂的后验图显然仍然是直接从先验采样。 【参考方案1】:

问题是您在 theta 上设置先验时尝试索引硬币薄荷的方式。在这种情况下只有 4 个theta,而不是nFlips。您的嵌套索引 mints[coins] 正在访问 mints 数据向量,而不是每个硬币所属的铸币厂的向量。我在下面创建了一个更正的版本。请注意一个向量的显式构造,该向量索引每个硬币所属的铸币厂。另请注意,在模型规范中,每个 for 循环索引都有自己的索引名称,与数据名称不同。

graphics.off() # This closes all of R's graphics windows.
rm(list=ls())  # Careful! This clears all of R's memory!

library(runjags)
library(coda)

#library(rjags)

############### 1. Generate the data 

flips <- c(sample(c(rep(1,3), rep(0,9))), # coin 1, mint 1, 12 flips total
           sample(c(rep(1,1), rep(0,4))), # coin 2, mint 1, 5 flips total
           sample(c(rep(1,10), rep(0,5))), # coin 1, mint 2, 15 flips
           sample(c(rep(1,17), rep(0,6)))) # coin 2, mint 2, 23 flips

# NOTE: I got rid of `factor` because it was unneeded and got in the way
coins <- c(rep(1,12), rep(2,5), rep(3, 15), rep(4, 23))

# NOTE: I got rid of `factor` because it was unneeded and got in the way
mints <- c(rep(1,17), rep(2,38))

nFlips <- length(flips) 
nCoins <- length(unique(coins))
nMints <- length(unique(mints))

# NEW: Create vector that specifies the mint of each coin. There must be a     more 
# elegant way to do this, but here is a logical brute-force approach. This
# assumes that coins are consecutively numbered from 1 to nCoins.
mintOfCoin = NULL
for ( cIdx in 1:nCoins ) 
  mintOfCoin = c( mintOfCoin , unique(mints[coins==cIdx]) )


#################### 2. Pass data into a list 

dataList <- list(
  flips = flips,
  coins = coins,
  mints = mints,
  nFlips = nFlips,
  nCoins = nCoins,
  nMints = nMints,
  mintOfCoin = mintOfCoin # NOTE
  )


################### 3. specify and save the model 

modelString <- "
model
  # start with nested likelihood function
  for (fIdx in 1:nFlips) 
    flips[fIdx] ~ dbern( theta[coins[fIdx]] )
   
  # next the prior on theta
  # NOTE: Here we use the mintOfCoin index.
  for (cIdx in 1:nCoins) 
    theta[cIdx] ~ dbeta( omega[mintOfCoin[cIdx]]*(kappa - 2) + 1 ,
                          ( 1 - omega[mintOfCoin[cIdx]])*(kappa - 2) + 1 ) 
  
  # next we specify the prior for the higher-level parameters on the mint, 
  # omega and kappa
  # NOTE: I changed the name of the mint index so it doesn't conflict with 
  # mints data vector.
  for (mIdx in 1:nMints) 
    omega[mIdx] ~ dbeta(2,2)
  
  kappa <- 5

"


writeLines(modelString, "tempModelHier4CoinTwoMint.txt")

############################### Step 4: Initialise Chains 

initsList <- list(theta1 = mean(flips[coins==1]),
                  theta2 = mean(flips[coins==2]),
                  theta3 = mean(flips[coins==3]),
                  theta4 = mean(flips[coins==4]),
                  omega1 = mean(c(mean(flips[coins==1]),
                                  mean(flips[coins==2]))),
                  omega2 = mean(c(mean(flips[coins==3]),
                                  mean(flips[coins==4]))))

initsList


############################### Step 5: Generate Chains 

runJagsOut <- run.jags(method = "parallel",
                       model = "tempModelHier4CoinTwoMint.txt",
                       # NOTE: theta and omega are vectors:
                       monitor = c( "theta", "omega" , "kappa" ),
                       data = dataList,
                       #inits = initsList, # NOTE: Let JAGS initialize.
                       n.chains = 4, # NOTE: Not only 1 chain.
                       adapt = 500,
                       burnin = 1000,
                       sample = 10000,
                       thin = 1,
                       summarise = FALSE,
                       plots = FALSE)



############################### Step 6: Convert to Coda Object 

codaSamples <- as.mcmc.list(runJagsOut)

head(codaSamples)

########################################
## NOTE: Important step --- Check MCMC diagnostics 

# Display diagnostics of chain, for specified parameters:
source("DBDA2E-utilities.R") # For function diagMCMC()
parameterNames = varnames(codaSamples) # from coda package
for ( parName in parameterNames ) 
  diagMCMC( codaObject=codaSamples , parName=parName )




############################### Step 7: Make Graphs 
# ...

【讨论】:

谢谢@John K. Kruschke。一如既往地非常有帮助。顺便说一句,喜欢你的书。

以上是关于JAGS/rjags 中多个组的单独贝叶斯参数估计的主要内容,如果未能解决你的问题,请参考以下文章

贝叶斯分类器(2)极大似然估计、MLE与MAP

[参数方法] 贝叶斯估计(待补充)

频率学派 极大似然估计MLE,贝叶斯学派 最大后验估计MAP 2021-05-11

第七章 贝叶斯网络

贝叶斯估计和极大似然估计到底有何区别

贝叶斯线性回归(Bayesian Linear Regression)