更有效地进行多重回归模型(总共 8 个:2 个模型包含不同的预测变量和 4 个不同的结果)

Posted

技术标签:

【中文标题】更有效地进行多重回归模型(总共 8 个:2 个模型包含不同的预测变量和 4 个不同的结果)【英文标题】:Doing multiple regression models (8 in total: 2 models containing different predictors and on 4 different outcomes) more efficiently 【发布时间】:2022-01-07 20:22:42 【问题描述】:

对于我目前的项目,我反复指定回归模型,在不同的结果上具有不同数量的预测变量/协变量。现在我只是完整地写出每个模型,但我确信有一种(非常)更快的方法需要更少的代码来完成我正在做的事情。

我的示例数据是 24 名中风患者的重复测量数据集,我在其中评估了三种不同类型的康复 (Group) 对功能恢复分数 (Outcome 1Outcome 4) 的影响。每周测量每位患者的功能能力 (Time_num),持续 8 周:

library(tidyverse)
library(magrittr)
library(nlme)

mydata <- structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 
13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 
16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 
18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 
19L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 
21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L, 
23L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 
24L, 24L), Age = c(60, 52.5, 57.1, 63, 65.1, 39, 59.3, 65.3, 
61.4, 56.3, 46.4, 58.2, 58, 57.7, 56.6, 42.3, 52.5, 51.8, 43.2, 
50.9, 56.7, 67.5, 65, 56.5, 65.5, 45.6, 56.7, 47.9, 65.5, 46.6, 
68.6, 52.1, 43.1, 62.1, 62.9, 58.3, 49.6, 42.1, 59.7, 62.9, 56.2, 
71.7, 60.5, 59.8, 54.3, 76.1, 56.2, 74.3, 48.7, 69.9, 59.6, 58.4, 
55.9, 56.5, 33, 57.1, 63, 53.1, 51.3, 46.9, 57.2, 47, 58, 63.7, 
69.8, 57.9, 62.7, 44.8, 51.5, 57, 58.1, 53.3, 57.2, 54.2, 50.2, 
60.4, 61.1, 81.3, 59.6, 68.8, 49.2, 51, 53.5, 55.9, 66.7, 60.3, 
59.8, 61.6, 63.8, 59.8, 55.5, 57.7, 66.3, 54.7, 56.3, 56.7, 57.7, 
63.8, 53.5, 56.1, 49, 44.5, 36, 58.2, 50.8, 56.8, 47.9, 51.1, 
53.2, 53.4, 59.3, 42.8, 63.6, 51.2, 49, 62.6, 44.8, 59.9, 44.7, 
56, 54.3, 58.7, 62.2, 76.7, 31.4, 65.2, 52.8, 56.7, 52.4, 60.6, 
54.8, 43.2, 77.6, 58.1, 49.8, 55.2, 53.6, 54.1, 72.9, 58.7, 51.9, 
64.9, 56.6, 61, 71.3, 63.1, 57.4, 56.9, 53.8, 73, 58.9, 60.7, 
63.8, 54.6, 74.5, 46.7, 44.2, 56.3, 66.8, 56.5, 43.6, 62.8, 55.3, 
53.7, 54.9, 46.6, 51.8, 60.7, 62.9, 61.5, 61.6, 43.6, 66.8, 50.1, 
51.6, 69.9, 52.2, 58.1, 62.1, 69.2, 59.1, 55.2, 47.2, 64.5, 54.2, 
75.9, 52.9, 62.5, 58, 64.5, 70.7, 60.5), Sex = structure(c(1L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 
1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 
2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Male", 
"Female"), class = "factor"), Group = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), Time_num = c(1, 2, 3, 4, 5, 6, 
7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 
4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 
1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 
6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 
3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 
8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 
5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 
2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 
7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8), First_outcome = c(45L, 
45L, 45L, 45L, 80L, 80L, 80L, 90L, 20L, 25L, 25L, 25L, 30L, 35L, 
30L, 50L, 50L, 50L, 55L, 70L, 70L, 75L, 90L, 90L, 25L, 25L, 35L, 
40L, 60L, 60L, 70L, 80L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 20L, 20L, 30L, 50L, 50L, 60L, 85L, 95L, 30L, 35L, 
35L, 40L, 50L, 60L, 75L, 85L, 30L, 35L, 45L, 50L, 55L, 65L, 65L, 
70L, 40L, 55L, 60L, 70L, 80L, 85L, 90L, 90L, 65L, 65L, 70L, 70L, 
80L, 80L, 80L, 80L, 30L, 30L, 40L, 45L, 65L, 85L, 85L, 85L, 25L, 
35L, 35L, 35L, 40L, 45L, 45L, 45L, 45L, 45L, 80L, 80L, 80L, 80L, 
80L, 80L, 15L, 15L, 10L, 10L, 10L, 20L, 20L, 20L, 35L, 35L, 35L, 
45L, 45L, 45L, 50L, 50L, 40L, 40L, 40L, 55L, 55L, 55L, 60L, 65L, 
20L, 20L, 30L, 30L, 30L, 30L, 30L, 30L, 35L, 35L, 35L, 40L, 40L, 
40L, 40L, 40L, 35L, 35L, 35L, 40L, 40L, 40L, 45L, 45L, 45L, 65L, 
65L, 65L, 80L, 85L, 95L, 100L, 45L, 65L, 70L, 90L, 90L, 95L, 
95L, 100L, 25L, 30L, 30L, 35L, 40L, 40L, 40L, 40L, 25L, 25L, 
30L, 30L, 30L, 30L, 35L, 40L, 15L, 35L, 35L, 35L, 40L, 50L, 65L, 
65L), Second_outcome = c(3, 50, 7, 43, -23, 32, 48, 46, 32, 46, 
23, 34, 46, -2, 46, 49, 45, 44, 53, 1, 61, 23, 41, 52, 25, 54, 
26, -1, 22, 50, 21, 20, 70, 62, 67, 18, 55, 25, 5, 16, 43, 35, 
59, 5, -5, 50, 35, 32, 25, 25, 13, 57, 42, 21, 35, 34, 38, 52, 
63, 52, 44, 36, 32, 30, 26, 42, 44, 53, 39, 29, 13, 37, 41, 31, 
18, 41, 40, 29, 28, 22, 6, -15, 16, 26, 0, 41, 35, 28, 35, 32, 
41, 49, 16, 43, 56, 63, 14, 46, 43, 46, 36, -3, 49, 33, 49, 20, 
20, 31, 27, 23, 34, 36, 39, 20, 29, 58, 45, 60, 40, 17, 77, 45, 
13, 62, 43, 74, 47, 56, 13, 12, 36, 2, 40, 57, 35, 31, 28, 82, 
49, 6, 10, 46, 49, 17, 55, 16, 12, -17, -7, 22, 20, -14, 21, 
17, 41, 47, 25, 34, 72, 59, 26, 24, 46, 16, 35, 34, 51, 40, 25, 
53, 24, 14, 66, 18, 18, 34, 29, 81, 12, 50, 55, 33, 62, 38, 24, 
25, 29, 60, 71, -6, 60, 49), Third_outcome = c(87, 78, 94, 93, 
78, 84, 72, 81, 82, 81, 86, 72, 80, 82, 77, 82, 79, 71, 82, 79, 
86, 86, 76, 73, 80, 74, 81, 73, 81, 80, 65, 84, 73, 85, 87, 78, 
77, 70, 85, 80, 77, 73, 75, 85, 67, 87, 90, 84, 71, 73, 81, 72, 
74, 74, 85, 90, 75, 70, 81, 69, 81, 73, 79, 74, 76, 77, 82, 80, 
87, 87, 82, 81, 76, 80, 79, 71, 81, 77, 74, 78, 73, 79, 77, 78, 
94, 78, 71, 82, 81, 80, 79, 70, 68, 82, 78, 68, 66, 82, 80, 71, 
73, 79, 83, 71, 80, 78, 82, 73, 86, 76, 75, 81, 84, 84, 85, 80, 
83, 79, 75, 77, 82, 89, 78, 74, 79, 82, 73, 86, 77, 81, 84, 84, 
73, 80, 82, 81, 81, 83, 81, 79, 84, 82, 75, 75, 80, 67, 81, 82, 
82, 80, 80, 80, 76, 81, 82, 85, 86, 81, 89, 78, 84, 79, 80, 77, 
85, 88, 78, 81, 82, 81, 82, 77, 74, 86, 81, 73, 80, 77, 81, 76, 
83, 76, 81, 79, 76, 83, 77, 79, 71, 77, 82, 87), Fourth_outcome = c(59, 
36, 53, 51, 59, 50, 56, 57, 52, 42, 60, 44, 46, 52, 54, 68, 63, 
37, 51, 46, 67, 42, 63, 47, 41, 48, 51, 48, 51, 34, 35, 46, 52, 
52, 44, 67, 47, 58, 57, 55, 50, 56, 36, 42, 51, 51, 42, 49, 59, 
55, 44, 53, 42, 64, 75, 64, 41, 44, 39, 64, 40, 48, 51, 54, 42, 
52, 35, 55, 53, 66, 34, 50, 56, 35, 32, 63, 52, 35, 63, 38, 57, 
67, 35, 41, 47, 31, 55, 60, 52, 60, 44, 52, 63, 53, 48, 69, 43, 
44, 40, 45, 63, 39, 48, 56, 44, 57, 56, 62, 54, 49, 47, 62, 41, 
41, 59, 32, 62, 39, 64, 46, 44, 78, 68, 38, 51, 27, 57, 55, 67, 
51, 44, 61, 24, 49, 62, 61, 43, 41, 54, 47, 41, 28, 40, 31, 57, 
58, 36, 48, 58, 61, 67, 50, 47, 56, 56, 69, 43, 43, 58, 55, 48, 
52, 46, 51, 38, 58, 44, 43, 49, 59, 31, 37, 46, 55, 45, 50, 45, 
67, 48, 37, 51, 47, 66, 42, 52, 46, 61, 47, 34, 49, 58, 38)), row.names = c(NA, 
-192L), class = c("tbl_df", "tbl", "data.frame"))

如下所示:

head(mydata)

# A tibble: 6 x 9
  Subject   Age Sex    Group Time_num First_outcome Second_outcome Third_outcome Fourth_outcome
    <int> <dbl> <fct>  <fct>    <dbl>         <int>          <dbl>         <dbl>          <dbl>
1       1  60   Male   A            1            45              3            87             59
2       1  52.5 Female A            2            45             50            78             36
3       1  57.1 Female A            3            45              7            94             53
4       1  63   Male   A            4            45             43            93             51
5       1  65.1 Male   A            5            80            -23            78             59
6       1  39   Female A            6            80             32            84             50

我现在运行的模型是每个结果 2 个线性混合效应模型(使用 nlme::lme):一个只包含 Group,一个另外包含 AgeSex。我现在的做法是:

# Outcome 1
outcome1_modelA <- 
  lme(fixed=First_outcome ~ 1 + Time_num*Group, 
      random= ~1 + Time_num|Subject, 
      data=mydata, 
      na.action="na.omit",
      method="ML")

outcome1_modelB <- 
  lme(fixed=First_outcome ~ 1 + Time_num*Group + Time_num*Age + Time_num*Sex, 
      random= ~1 + Time_num|Subject, 
      data=mydata, 
      na.action="na.omit",
      method="ML")

# Outcome 2, 3, and finally...
# Outcome 4
outcome4_modelA <- 
  lme(fixed=Fourth_outcome ~ 1 + Time_num*Group, 
      random= ~1 + Time_num|Subject, 
      data=mydata, 
      na.action="na.omit",
      method="ML")

outcome4_modelB <- 
  lme(fixed=Fourth_outcome ~ 1 + Time_num*Group + Time_num*Age + Time_num*Sex, 
      random= ~1 + Time_num|Subject, 
      data=mydata, 
      na.action="na.omit",
      method="ML")

但是看到我得到了更多的结果和更多的模型,我想学习一种方法来提高我的代码效率。我读过 for-loops 但似乎找不到适合我的例子。不涉及 for 循环的解决方案也将不胜感激!

【问题讨论】:

【参考方案1】:

创建一个函数以使用您要更改的部分的参数来执行此操作 - 您需要 as.formula() 函数

my_model_function <- function(x, y)
  fixed_effects <- as.formula(paste(y, "~ 1 +",
                                    paste("Time_num", x, sep="*", collapse=" + ")))
  lme(fixed=fixed_effects, 
      random= ~1 + Time_num|Subject, 
      data=mydata, 
      na.action="na.omit",
      method="ML")


outcome1_modelA <- my_model_function(x = "Group", 
                                     y = "First_outcome")
outcome1_modelB <- my_model_function(x = c("Group", "Age", "Sex"), 
                                     y = "First_outcome")

为了使其更加自动化,您可以在 lapply() 中创建嵌套循环,这将返回模型输出的嵌套列表。

x_values <- list("Group", c("Group", "Age", "Sex"))
y_values <- list("First_outcome", "Second_outcome")
lapply(x_values, function(x_value)
  lapply(y_values, function(y_value)
    my_model_function(x_value, y_value)
  )
)

您也可以在这里替换用户定义的函数 - 真的不需要创建一个函数,因为这允许从一段代码重复调用它(我把它留在里面是因为我在手机上写,它是太多的编辑,但类似于)

lapply(list(1,2,3), function(i)
  i^2
)

【讨论】:

嗨@rg255,感谢您的回答。这真的很有帮助(即使我并不真正理解函数,但我可以阅读它以完全理解这段代码)。话虽如此,为什么/如何您的第二段代码不需要该功能?它调用第一个代码块中的函数对吗? 您可以将 lapplys 中的函数调用替换为函数内部编写的代码 - 请参阅编辑后的代码,这是一个非常简单的示例,但它没有 lapply 外部的函数

以上是关于更有效地进行多重回归模型(总共 8 个:2 个模型包含不同的预测变量和 4 个不同的结果)的主要内容,如果未能解决你的问题,请参考以下文章

多元线性回归多重共线性检验及避免方法,简单点的

拓端tecdat|R语言编程指导用线性模型进行臭氧预测: 加权泊松回归,普通最小二乘,加权负二项式模型,多重插补缺失值

lasso和cox结果矛盾

做一个回归分析,有两个变量总是被剔除,可是我不希望它被剔除,这该怎么办?后天论文答辩用啊···

加载图像以进行检测的更有效方式

房价预测模型