拟合的 coxph 模型随时间变化的地层和集群的公式中的项排序而变化
Posted
技术标签:
【中文标题】拟合的 coxph 模型随时间变化的地层和集群的公式中的项排序而变化【英文标题】:Fitted coxph model varies by ordering of terms in formula with time-varying strata and clusters 【发布时间】:2021-10-24 07:02:22 【问题描述】:我在 R 中的 survival
包中遇到了奇怪的行为,在我直接联系包维护者之前,我希望比我更聪明的人能告诉我这不是错误,而是更具统计意义的东西细致入微(因此我在这里发帖)。
上下文是我想根据时间层来拟合具有时变系数的 Cox 模型,并且我还希望稳健的标准误差来解释聚类观察。令我惊讶的是,输出因formula
参数中的术语顺序而异,并且它似乎选择性地忽略交互,具体取决于它们是在集群参数之前还是之后。
请参阅下面的reprex,包括我的cmets,它们指出了我觉得奇怪的事情。谢谢
PS 一个额外的问题:你知道为什么,在下面的expected_fit
中,age
在之前 strata(tgroup)
在变量名中,而ph.ecog
在之后 em> strata(tgroup)
?
library(survival);
library(tidyverse);
# Create a fake cluster in the lung data
lung_expanded <-
lung %>%
mutate(fake_id = sample(1:3, nrow(lung), replace = T))
# Model below is as expected: I get the expected time-stratified hazard
# ratios and robust standard errors
expected_fit <-
survSplit(Surv(time, status) ~ .,
data = lung_expanded,
cut = c(200, 300),
episode = "tgroup") %>%
coxph(formula = Surv(tstart, time, status) ~
# putting cluster prior to age:time interaction
# gives expected results
cluster(fake_id) + age:strata(tgroup) + ph.ecog:strata(tgroup))
expected_fit
#> Call:
#> coxph(formula = Surv(tstart, time, status) ~ age:strata(tgroup) +
#> strata(tgroup):ph.ecog, data = ., cluster = fake_id)
#>
#> coef exp(coef) se(coef) robust se z
#> age:strata(tgroup)tgroup=1 0.007912 1.007943 0.013858 0.010575 0.748
#> age:strata(tgroup)tgroup=2 -0.003444 0.996562 0.021995 0.014347 -0.240
#> age:strata(tgroup)tgroup=3 0.020987 1.021209 0.015680 0.006119 3.430
#> strata(tgroup)tgroup=1:ph.ecog 0.665031 1.944551 0.176550 0.177987 3.736
#> strata(tgroup)tgroup=2:ph.ecog 0.650765 1.917007 0.278808 0.032228 20.193
#> strata(tgroup)tgroup=3:ph.ecog 0.097243 1.102129 0.188521 0.131935 0.737
#> p
#> age:strata(tgroup)tgroup=1 0.454370
#> age:strata(tgroup)tgroup=2 0.810300
#> age:strata(tgroup)tgroup=3 0.000604
#> strata(tgroup)tgroup=1:ph.ecog 0.000187
#> strata(tgroup)tgroup=2:ph.ecog < 2e-16
#> strata(tgroup)tgroup=3:ph.ecog 0.461090
#>
#> Likelihood ratio test=24.9 on 6 df, p=0.0003561
#> n= 462, number of events= 164
#> (1 observation deleted due to missingness)
# Model output below differs from above, but it varies only in the ordering of the
# formula. I only get the hazard ratios corresponding to age but not
# those corresponding ecog status. moreover, I get an unexpected hazard
# ratio corresponding to a newly created variable called 'cluster(fake_id)'
suspicious_fit <-
survSplit(Surv(time, status) ~ .,
data = lung_expanded,
cut = c(200, 300),
episode = "tgroup") %>%
coxph(formula = Surv(tstart, time, status) ~
# putting age:time interaction prior to the cluster
# gives suspicious results
age:strata(tgroup) + cluster(fake_id) + ph.ecog:strata(tgroup))
suspicious_fit
#> Call:
#> coxph(formula = Surv(tstart, time, status) ~ cluster(fake_id) +
#> age:strata(tgroup), data = ., cluster = fake_id)
#>
#> coef exp(coef) se(coef) robust se z p
#> cluster(fake_id) -0.04020 0.96060 0.09952 0.05094 -0.789 0.43004
#> age:strata(tgroup)tgroup=1 0.01928 1.01947 0.01354 0.01477 1.306 0.19166
#> age:strata(tgroup)tgroup=2 0.01029 1.01034 0.02149 0.01782 0.578 0.56359
#> age:strata(tgroup)tgroup=3 0.02254 1.02279 0.01556 0.00655 3.441 0.00058
#>
#> Likelihood ratio test=4.62 on 4 df, p=0.3288
#> n= 463, number of events= 165
# In the model below, which has no time-varying strata,
# everything looks good.
expected_fit2 <-
lung_expanded %>%
coxph(formula = Surv(time, status) ~
# putting age:time interaction prior to the cluster
# gives suspicious results
age + cluster(fake_id) + ph.ecog)
expected_fit2
#> Call:
#> coxph(formula = Surv(time, status) ~ age + ph.ecog, data = .,
#> cluster = fake_id)
#>
#> coef exp(coef) se(coef) robust se z p
#> age 0.011281 1.011345 0.009319 0.006714 1.68 0.0929
#> ph.ecog 0.443485 1.558128 0.115831 0.028093 15.79 <2e-16
#>
#> Likelihood ratio test=19.06 on 2 df, p=7.279e-05
#> n= 227, number of events= 164
#> (1 observation deleted due to missingness)
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6 purrr_0.3.4
#> [5] readr_1.4.0 tidyr_1.1.3 tibble_3.1.2 ggplot2_3.3.3
#> [9] tidyverse_1.3.1 survival_3.2-11
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.1.1 xfun_0.23 splines_4.1.0 haven_2.4.1
#> [5] lattice_0.20-44 colorspace_2.0-1 vctrs_0.3.8 generics_0.1.0
#> [9] htmltools_0.5.1.1 yaml_2.2.1 utf8_1.2.1 rlang_0.4.11
#> [13] pillar_1.6.1 glue_1.4.2 withr_2.4.2 DBI_1.1.1
#> [17] dbplyr_2.1.1 readxl_1.3.1 modelr_0.1.8 lifecycle_1.0.0
#> [21] cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 rvest_1.0.0
#> [25] evaluate_0.14 knitr_1.33 fansi_0.5.0 highr_0.9
#> [29] Rcpp_1.0.7 broom_0.7.6 backports_1.2.1 scales_1.1.1
#> [33] jsonlite_1.7.2 fs_1.5.0 hms_1.1.0 digest_0.6.27
#> [37] stringi_1.6.2 grid_4.1.0 cli_2.5.0 tools_4.1.0
#> [41] magrittr_2.0.1 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2
#> [45] Matrix_1.3-4 xml2_1.3.2 reprex_2.0.0 lubridate_1.7.10
#> [49] assertthat_0.2.1 rmarkdown_2.8 httr_1.4.2 rstudioapi_0.13
#> [53] R6_2.5.0 compiler_4.1.0
Created on 2021-08-16 by the reprex package (v2.0.0)
```
【问题讨论】:
“suspicious_fit”打印输出中的Call
根本不包含strata(tgroup):ph.ecog
术语。检查您尝试进行相应函数调用的方式;无论出于何种原因,coxph()
甚至都不知道包含该术语。
我深入研究了survival::coxph
代码,确定了有问题的代码行,并联系了确认这是一个错误的包维护人员。他指出“...基础 R 中的 [.formula] 中存在错误,这些错误会被 update.formula 继承并重新制定。”他刚刚告诉我版本 3.2-13 修复了这个问题。我建议应该将其移至 ***。
感谢您完成此任务,并在此处更新。按照您的建议,我已投票将其移至 ***
。
【参考方案1】:
我深入研究了survival::coxph
代码,确定了有问题的代码行,并联系了 Terry Therneau(包维护者),他确认这是一个错误。他指出“...基础 R 中的 [.formula] 中存在错误,这些错误会被 update.formula 继承并重新制定。”他告诉我版本 3.2-13 修复了这个问题。
【讨论】:
以上是关于拟合的 coxph 模型随时间变化的地层和集群的公式中的项排序而变化的主要内容,如果未能解决你的问题,请参考以下文章
R语言使用coxph函数对加权数据集进行竞争风险模型拟合使用regplot包的regplot函数可视化竞争风险回归模型的列线图nomogram计算特定病例的事件概率(控制了竞争风险的累计事件概率)
使用随时间变化的模型为 Sequelize 制作可重现的迁移字符串?
R语言使用coxph函数构建生存分析回归模型,使用forestmodel包的forest_model函数可视化生存回归模型对应的森林图