拟合的 coxph 模型随时间变化的地层和集群的公式中的项排序而变化

Posted

技术标签:

【中文标题】拟合的 coxph 模型随时间变化的地层和集群的公式中的项排序而变化【英文标题】:Fitted coxph model varies by ordering of terms in formula with time-varying strata and clusters 【发布时间】:2021-10-24 07:02:22 【问题描述】:

我在 R 中的 survival 包中遇到了奇怪的行为,在我直接联系包维护者之前,我希望比我更聪明的人能告诉我这不是错误,而是更具统计意义的东西细致入微(因此我在这里发帖)。

上下文是我想根据时间层来拟合具有时变系数的 Cox 模型,并且我还希望稳健的标准误差来解释聚类观察。令我惊讶的是,输出因formula 参数中的术语顺序而异,并且它似乎选择性地忽略交互,具体取决于它们是在集群参数之前还是之后。

请参阅下面的reprex,包括我的cmets,它们指出了我觉得奇怪的事情。谢谢

PS 一个额外的问题:你知道为什么,在下面的expected_fit 中,age之前 strata(tgroup) 在变量名中,而ph.ecog之后 em> strata(tgroup)?

library(survival);
library(tidyverse);

# Create a fake cluster in the lung data
lung_expanded <- 
  lung %>%
  mutate(fake_id = sample(1:3, nrow(lung), replace = T))


# Model below is as expected: I get the expected time-stratified hazard 
# ratios and robust standard errors 

expected_fit <- 
  survSplit(Surv(time, status) ~ ., 
            data = lung_expanded, 
            cut = c(200, 300), 
            episode = "tgroup") %>%
  coxph(formula = Surv(tstart, time, status) ~ 
          # putting cluster prior to age:time interaction
          # gives expected results
          cluster(fake_id) + age:strata(tgroup) + ph.ecog:strata(tgroup))
expected_fit
#> Call:
#> coxph(formula = Surv(tstart, time, status) ~ age:strata(tgroup) + 
#>     strata(tgroup):ph.ecog, data = ., cluster = fake_id)
#> 
#>                                     coef exp(coef)  se(coef) robust se      z
#> age:strata(tgroup)tgroup=1      0.007912  1.007943  0.013858  0.010575  0.748
#> age:strata(tgroup)tgroup=2     -0.003444  0.996562  0.021995  0.014347 -0.240
#> age:strata(tgroup)tgroup=3      0.020987  1.021209  0.015680  0.006119  3.430
#> strata(tgroup)tgroup=1:ph.ecog  0.665031  1.944551  0.176550  0.177987  3.736
#> strata(tgroup)tgroup=2:ph.ecog  0.650765  1.917007  0.278808  0.032228 20.193
#> strata(tgroup)tgroup=3:ph.ecog  0.097243  1.102129  0.188521  0.131935  0.737
#>                                       p
#> age:strata(tgroup)tgroup=1     0.454370
#> age:strata(tgroup)tgroup=2     0.810300
#> age:strata(tgroup)tgroup=3     0.000604
#> strata(tgroup)tgroup=1:ph.ecog 0.000187
#> strata(tgroup)tgroup=2:ph.ecog  < 2e-16
#> strata(tgroup)tgroup=3:ph.ecog 0.461090
#> 
#> Likelihood ratio test=24.9  on 6 df, p=0.0003561
#> n= 462, number of events= 164 
#>    (1 observation deleted due to missingness)

# Model output below differs from above, but it varies only in the ordering of the
# formula. I only get the hazard ratios  corresponding to  age but not 
# those corresponding ecog status. moreover, I get an unexpected hazard 
# ratio corresponding to a newly created variable called 'cluster(fake_id)'

suspicious_fit <- 
  survSplit(Surv(time, status) ~ ., 
            data = lung_expanded, 
            cut = c(200, 300), 
            episode = "tgroup") %>%
  coxph(formula = Surv(tstart, time, status) ~ 
          # putting age:time interaction prior to the cluster 
          # gives suspicious results
          age:strata(tgroup) + cluster(fake_id) + ph.ecog:strata(tgroup))
suspicious_fit
#> Call:
#> coxph(formula = Surv(tstart, time, status) ~ cluster(fake_id) + 
#>     age:strata(tgroup), data = ., cluster = fake_id)
#> 
#>                                coef exp(coef) se(coef) robust se      z       p
#> cluster(fake_id)           -0.04020   0.96060  0.09952   0.05094 -0.789 0.43004
#> age:strata(tgroup)tgroup=1  0.01928   1.01947  0.01354   0.01477  1.306 0.19166
#> age:strata(tgroup)tgroup=2  0.01029   1.01034  0.02149   0.01782  0.578 0.56359
#> age:strata(tgroup)tgroup=3  0.02254   1.02279  0.01556   0.00655  3.441 0.00058
#> 
#> Likelihood ratio test=4.62  on 4 df, p=0.3288
#> n= 463, number of events= 165

# In the model below, which has no time-varying strata, 
# everything looks good. 

expected_fit2  <- 
  lung_expanded %>%
  coxph(formula = Surv(time, status) ~ 
          # putting age:time interaction prior to the cluster 
          # gives suspicious results
          age + cluster(fake_id) + ph.ecog)
expected_fit2
#> Call:
#> coxph(formula = Surv(time, status) ~ age + ph.ecog, data = ., 
#>     cluster = fake_id)
#> 
#>             coef exp(coef) se(coef) robust se     z      p
#> age     0.011281  1.011345 0.009319  0.006714  1.68 0.0929
#> ph.ecog 0.443485  1.558128 0.115831  0.028093 15.79 <2e-16
#> 
#> Likelihood ratio test=19.06  on 2 df, p=7.279e-05
#> n= 227, number of events= 164 
#>    (1 observation deleted due to missingness)

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.6     purrr_0.3.4    
#>  [5] readr_1.4.0     tidyr_1.1.3     tibble_3.1.2    ggplot2_3.3.3  
#>  [9] tidyverse_1.3.1 survival_3.2-11
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.1  xfun_0.23         splines_4.1.0     haven_2.4.1      
#>  [5] lattice_0.20-44   colorspace_2.0-1  vctrs_0.3.8       generics_0.1.0   
#>  [9] htmltools_0.5.1.1 yaml_2.2.1        utf8_1.2.1        rlang_0.4.11     
#> [13] pillar_1.6.1      glue_1.4.2        withr_2.4.2       DBI_1.1.1        
#> [17] dbplyr_2.1.1      readxl_1.3.1      modelr_0.1.8      lifecycle_1.0.0  
#> [21] cellranger_1.1.0  munsell_0.5.0     gtable_0.3.0      rvest_1.0.0      
#> [25] evaluate_0.14     knitr_1.33        fansi_0.5.0       highr_0.9        
#> [29] Rcpp_1.0.7        broom_0.7.6       backports_1.2.1   scales_1.1.1     
#> [33] jsonlite_1.7.2    fs_1.5.0          hms_1.1.0         digest_0.6.27    
#> [37] stringi_1.6.2     grid_4.1.0        cli_2.5.0         tools_4.1.0      
#> [41] magrittr_2.0.1    crayon_1.4.1      pkgconfig_2.0.3   ellipsis_0.3.2   
#> [45] Matrix_1.3-4      xml2_1.3.2        reprex_2.0.0      lubridate_1.7.10 
#> [49] assertthat_0.2.1  rmarkdown_2.8     httr_1.4.2        rstudioapi_0.13  
#> [53] R6_2.5.0          compiler_4.1.0
Created on 2021-08-16 by the reprex package (v2.0.0)
```

【问题讨论】:

“suspicious_fit”打印输出中的Call 根本不包含strata(tgroup):ph.ecog 术语。检查您尝试进行相应函数调用的方式;无论出于何种原因,coxph() 甚至都不知道包含该术语。 我深入研究了survival::coxph 代码,确定了有问题的代码行,并联系了确认这是一个错误的包维护人员。他指出“...基础 R 中的 [.formula] 中存在错误,这些错误会被 update.formula 继承并重新制定。”他刚刚告诉我版本 3.2-13 修复了这个问题。我建议应该将其移至 ***。 感谢您完成此任务,并在此处更新。按照您的建议,我已投票将其移至 *** 【参考方案1】:

我深入研究了survival::coxph 代码,确定了有问题的代码行,并联系了 Terry Therneau(包维护者),他确认这是一个错误。他指出“...基础 R 中的 [.formula] 中存在错误,这些错误会被 update.formula 继承并重新制定。”他告诉我版本 3.2-13 修复了这个问题。

【讨论】:

以上是关于拟合的 coxph 模型随时间变化的地层和集群的公式中的项排序而变化的主要内容,如果未能解决你的问题,请参考以下文章

R语言使用coxph函数对加权数据集进行竞争风险模型拟合使用regplot包的regplot函数可视化竞争风险回归模型的列线图nomogram计算特定病例的事件概率(控制了竞争风险的累计事件概率)

R中的冲积地块:如何划分地层?

机器学习之欠拟合和过拟合

使用随时间变化的模型为 Sequelize 制作可重现的迁移字符串?

matlab中怎么让三维曲面的颜色随X,Y的值变化

R语言使用coxph函数构建生存分析回归模型,使用forestmodel包的forest_model函数可视化生存回归模型对应的森林图