是否有用于为生存分析准备数据集的 R 函数，如 Stata 中的 stset？

Posted 2023-02-19

技术标签:

【中文标题】是否有用于为生存分析准备数据集的 R 函数，如 Stata 中的 stset？【英文标题】：Is there a R function for preparing datasets for survival analysis like stset in Stata? 【发布时间】：2021-12-19 12:36:41 【问题描述】：

数据集如下所示

id  start   end  failure x1
 1    0      1      0    0
 1    1      3      0    0
 1    3      6      1    0
 2    0      1      1    1
 2    1      3      1    1
 2    3      4      0    1
 2    4      6      0    1
 2    6      7      1    1

如您所见，当id = 1 时，它只是survival 包中coxph 的数据输入。但是id = 2的时候，在开始和结束的时候都会失败，但是在中间，失败就消失了。

是否有一个通用函数可以从id = 2 中提取数据并得到类似id = 1 的结果？

我认为id = 2时，结果应该如下所示。

id  start   end  failure x1
1    0      1      0    0
1    1      3      0    0
1    3      6      1    0
2    3      4      0    1
2    4      6      0    1
2    6      7      1    1

【问题讨论】：

【参考方案1】：

有点老套，但应该可以完成工作。

数据：

# Load data
library(tidyverse)
df <- read_table("
 id   start  end    failure  x1
 1    0      1      0        0
 1    1      3      0        0
 1    3      6      1        0
 2    0      1      1        1
 2    1      3      1        1
 2    3      4      0        1
 2    4      6      0        1
 2    6      7      1        1
")

数据整理：

# Check for sub-groups within IDs and remove all but the last one
df <- df %>%
    # Group by ID
    group_by(
        id
    ) %>%
    mutate(
        # Check if a new sub-group is starting (after a failure)
        new_group = case_when(
            # First row is always group 0
            row_number() == 1 ~ 0,
            # If previous row was a failure, then a new sub-group starts here
            lag(failure) == 1 ~ 1,
            # Otherwise not
            TRUE ~ 0
        ),
        # Assign sub-group number by calculating cumulative sums
        group = cumsum(new_group)
    ) %>%
    # Keep only last sub-group for each ID
    filter(
        group == max(group)
    ) %>%
    ungroup() %>%
    # Remove working columns
    select(
        -new_group, -group
    )

结果：

> df
# A tibble: 6 × 5
     id start   end failure    x1
  <dbl> <dbl> <dbl>   <dbl> <dbl>
1     1     0     1       0     0
2     1     1     3       0     0
3     1     3     6       1     0
4     2     3     4       0     1
5     2     4     6       0     1
6     2     6     7       1     1

【讨论】：

以上是关于是否有用于为生存分析准备数据集的 R 函数，如 Stata 中的 stset？的主要内容，如果未能解决你的问题，请参考以下文章