在 R 中存储模拟结果

Posted

技术标签:

【中文标题】在 R 中存储模拟结果【英文标题】:Storing simulation results in R 【发布时间】:2018-07-11 20:33:30 【问题描述】:

我想估计 Mantel-Haenszel 差分项目功能 (DIF) 优势比和 HMDDIF 指数。我写了下面的函数。在我看来,我在存储结果时犯了一个错误。你能看看这个并给我反馈吗? 这是示例数据:

# generate dataset
r <- 1000
c <- 16
test <- matrix(rbinom(r*c,1,0.5),r,c)
# create sum scores for each student using first 15 columns
test <- cbind(test, apply(test[,1:15],1,sum))
colnames(test) <- c("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11","v12","v13","v14","v15","group","score")
test <- as.data.frame(test)

前 15 列是学生对项目/问题的正确/错误回答。组成员列是第 16 列。学生“分数”变量是最后(第 17 列)项目分数的总和。该公式可以在我从*** (https://en.wikipedia.org/wiki/Differential_item_functioning) 获得的图片中找到。

对于每个分数类别,我想估计这张图片中的最后两个公式。行是 10 个学生,列是六个项目/问题。同样,第 16 列是组成员(1-focal,0-reference) 这是我的功能代码。

    library(dplyr)

# this function first starts with the first item and loop k scores from 1-15. Then move to the second item.
# data should only contain the items, grouping variable, and person score.

Mantel.Haenszel <- function (data)  
  # browser() #runs with debug
  for (item in 1:15)  #item loop not grouping/scoring

    item.incorrect <- data[,item] == 0 
    item.correct   <- data[,item] == 1
    Results <-  c() 

    for (k in 1:15)  # for k scores

        Ak <- nrow(filter(data, score == k, group == 0, item.correct)) #  freq of ref group & correct

        Bk <- nrow(filter(data, score == k, group == 0, item.incorrect)) #  freq of ref group & incorrect

        Ck <- nrow(filter(data, score == k, group == 1, item.correct)) #  freq of foc group & correct

        Dk <- nrow(filter(data, score == k, group == 1, item.incorrect)) #  freq of foc group & incorrect

        nrk <- nrow(filter(data, score == k, group == 0)) #sample size for ref

        nfk <- nrow(filter(data, score == k, group == 1)) #sample size for focal

        if (Bk == 0 | Ck == 0)  

          next
        

      nominator   <-sum((Ak*Dk)/(nrk + nfk))
      denominator <-sum((Bk*Ck)/(nrk + nfk))
      odds.ratio  <- nominator/denominator

       if (odds.ratio == 0)  

        next
      

      MH.D.DIF <- (-2.35)*log(odds.ratio) #index

      # save the output
      out <- list("Odds Ratio" = odds.ratio, "MH Diff" = MH.D.DIF)
      results <- rbind(Results, out)
      return(results)

     # close score loop

   # close item loop

  #close function

这是我得到的

# test funnction
Mantel.Haenszel(test)

> Mantel.Haenszel(test)
    Odds Ratio MH Diff 
out 0.2678571  3.095659

我想要得到的是

> Mantel.Haenszel(test)
    Odds Ratio MH Diff 
out 0.2678571  3.095659
    ##         ##
    ..         ..
    (15 rows here for 15 score categories in the dataset)

【问题讨论】:

【参考方案1】:

您是否应该期望itemk 的每个组合都有一个结果,最大输出行数为225,除非有break 的任何实例?如果是这样,我认为你只需要改变一些小事情。首先,只在函数开头声明一次Results。然后,确保您是 rbind-ing 并返回 Resultsresults, but not both. Then, move yourreturn 到您的实际函数级别而不是循环。在下面的示例中,我还包含了当前的 itemk 用于演示:

Mantel.Haenszel <- function (data) 
  # browser() #runs with debug

  Results <-  c()

  for (item in 1:15) 
    #item loop not grouping/scoring

    item.incorrect <- data[, item] == 0
    item.correct   <- data[, item] == 1

    for (k in 1:15) 
      # for k scores

      Ak <-
        nrow(filter(data, score == k, group == 0, item.correct)) #  freq of ref group & correct

      Bk <-
        nrow(filter(data, score == k, group == 0, item.incorrect)) #  freq of ref group & incorrect

      Ck <-
        nrow(filter(data, score == k, group == 1, item.correct)) #  freq of foc group & correct

      Dk <-
        nrow(filter(data, score == k, group == 1, item.incorrect)) #  freq of foc group & incorrect

      nrk <-
        nrow(filter(data, score == k, group == 0)) #sample size for ref

      nfk <-
        nrow(filter(data, score == k, group == 1)) #sample size for focal

      if (Bk == 0 | Ck == 0) 
        next
      

      nominator   <- sum((Ak * Dk) / (nrk + nfk))
      denominator <- sum((Bk * Ck) / (nrk + nfk))
      odds.ratio  <- nominator / denominator

      if (odds.ratio == 0) 
        next
      

      MH.D.DIF <- (-2.35) * log(odds.ratio) #index

      # save the output
      out <-
        list(
          item = item,
          k = k,
          "Odds Ratio" = odds.ratio,
          "MH Diff" = MH.D.DIF
        )
      Results <- rbind(Results, out)
     # close score loop

   # close item loop

  return(Results)

 #close function

test.output <- Mantel.Haenszel(test)

给出如下输出:

> head(test.output, 20)
    item k  Odds Ratio MH Diff    
out 1    3  2          -1.628896  
out 1    4  4.666667   -3.620046  
out 1    5  0.757085   0.6539573  
out 1    6  0.5823986  1.27041    
out 1    7  0.9893293  0.02521097 
out 1    8  1.078934   -0.1785381 
out 1    9  1.006237   -0.01461145
out 1    10 1.497976   -0.9496695 
out 1    11 1.435897   -0.8502066 
out 1    12 1.5        -0.952843  
out 2    3  0.8333333  0.4284557  
out 2    4  2.424242   -2.08097   
out 2    5  1.368664   -0.7375117 
out 2    6  1.222222   -0.4715761 
out 2    7  0.6288871  1.089938   
out 2    8  1.219512   -0.4663597 
out 2    9  1          0          
out 2    10 2.307692   -1.965183  
out 2    11 0.6666667  0.952843   
out 2    12 0.375      2.304949 

这就是你要找的吗?

【讨论】:

嗨卢克,你的修改解决了这个问题。感谢您的宝贵时间!

以上是关于在 R 中存储模拟结果的主要内容,如果未能解决你的问题,请参考以下文章

将列表附加到 R 中的列表列表

R如何为循环编写双精度并将结果存储在矩阵中

在 Azure 持久函数上调用 CreateCheckStatusResponse 时,Azurite 未给出与 Azure 存储模拟器相同的结果

模拟访问公共 GCS 存储桶的结果

python简单模拟:把树存储在数据表中

在每个循环的R-in中自动创建和使用自定义函数 - 将结果存储在一个DF-3D阵列中