在单个 expss 表中添加和堆叠子组

Posted 2023-02-16

技术标签:

【中文标题】在单个 expss 表中添加和堆叠子组【英文标题】：Add and stack subgroups in a single expss table 【发布时间】：2020-07-23 23:59:17 【问题描述】：

这次有一个特殊要求，因为知道如何获得我想要的表输出，但想知道是否存在使用 expss 的不那么冗长的解决方案。首先，这个话题可以被认为是这个讨论的延伸 --> Complex tables with expss package，并且也和这个话题有关 --> How to display results from only select subgroups + the whole data frame in an expss table?

我的表结构如下：首先显示总数据框行的结果，然后按子组拆分。截至今天，以下是我的操作方式（以infert 数据集为例）：

1) 表格模板

### Banner set up
my_banner = infert %>%
  tab_cols(total())
my_custom_table = . %>%  
  tab_significance_options(sig_level=0.2, keep="none", sig_labels=NULL, subtable_marks="greater", mode="append") %>%
  tab_stat_cases(label="N", total_row_position="above", total_statistic="u_cases", total_label="TOTAL") %>% 
  tab_stat_cpct(label="%Col.", total_row_position="above", total_statistic="u_cpct", total_label="TOTAL") %>%
  # Parity x Education
  tab_cols(education) %>%
  tab_stat_cases(label="N", total_row_position="above", total_statistic="u_cases", total_label="TOTAL") %>% 
  tab_last_add_sig_labels() %>%
  tab_stat_cpct(label="%Col.", total_row_position="above", total_statistic="u_cpct", total_label="TOTAL") %>%
  tab_last_add_sig_labels() %>%
  tab_last_sig_cpct(label="T.1", compare_type="subtable")

2) 创建 3 个不同的表（总计 1 个，每个子组 1 个），合并为一个：

tab1 <- my_banner %>%
  tab_cells(parity) %>%
  my_custom_table() %>%
  tab_pivot(stat_position="inside_columns")
tab2 <- infert %>%
  apply_labels(education="education (CASE 0)") %>%
  tab_cells(parity) %>%
  tab_cols(total(label = "CASE 0")) %>%
  tab_subgroup(case==0) %>%
  my_custom_table() %>%
  tab_pivot(stat_position="inside_columns")
tab3 <- infert %>%
  apply_labels(education="education (CASE 1)") %>%
  tab_cells(parity) %>%
  tab_cols(total(label = "CASE 1")) %>%
  tab_subgroup(case==1) %>%
  my_custom_table() %>%
  tab_pivot(stat_position="inside_columns")

final_tab <- tab1 %merge% tab2 %merge% tab3

所有这段代码只适用于一张表，你理解我的担心。有什么好的实践技巧可以避免这个冗长（但有效）的序列吗？我的第一个猜测是：

my_banner %>%
  tab_cells(parity) %>%
  my_custom_table() %>%
  tab_subgroup(case==0) %>%
  my_custom_table() %>%
  tab_subgroup(case==1) %>%
  my_custom_table() %>%
  tab_pivot(stat_position="inside_columns")

计算了一个表，但输出与目标相去甚远，可能有一个修复方法，但我不知道在哪里寻找。任何帮助将不胜感激，谢谢！（注意：如果一个简单的解决方案涉及摆脱#TOTAL 列，我也可以）

【问题讨论】：

【参考方案1】：

关键思想是在tab_cols 中使用%nest% 而不是tab_subgroup：

library(expss)
data(infert)
my_banner = infert %>%
    apply_labels(
        education = "education",
        case = c(
            "CASE 0" = 0,
            "CASE 1" = 1
        )
    ) %>% 
    tab_cols(total(), education, case %nest% list(total(label = ""), education))

my_custom_table = . %>%  
    tab_significance_options(sig_level=0.2, keep="none", sig_labels=NULL, subtable_marks="greater", mode="append") %>%
    tab_stat_cases(label="N", total_row_position="above", total_statistic="u_cases", total_label="TOTAL") %>% 
    tab_last_add_sig_labels() %>%
    tab_stat_cpct(label="%Col.",
                  total_row_position="above", 
                  total_statistic=c("u_cases", "u_cpct"), 
                  total_label=c("TO_DELETE_TOTAL", "TOTAL")) %>%
    tab_last_add_sig_labels() %>%
    tab_last_sig_cpct(label="T.1", compare_type="subtable") %>% 
    tab_pivot(stat_position="inside_columns") %>% 
    # drop auxilary rows and columns
    where(!grepl("TO_DELETE", row_labels)) %>% 
    except(fixed("Total|T.1"), fixed("CASE 0|T.1"), fixed("CASE 1|T.1"))

my_banner %>% 
    tab_cells(parity) %>% 
    my_custom_table()

【讨论】：

希望你总能找到答案！这很好用，tahnks @Gregory Demin。更进一步，我想知道是否有可能使用这种方法对因子进行子集化？我尝试用这个替换现有的 case= 序列：

case = c("CASE 0"=as.logical(infert$case == levels(infert$case)[1]), "CASE 1"=as.logical(infert$case == levels(infert$case)[2]))

表已创建，但也显示了很多警告消息，所以我想这不是要走的路。最后一点是嵌套因子小计而不是单个级别。 @MaxenceDum。 apply_labels 不是子集。这只是一种显示“CASE 0”/“CASE 1”而不是简单的 0/1 的方法。所有操作都发生在这里：tab_cols(total(), education, case %nest% list(total(label = ""), education))。如果您的变量具有漂亮的值，您可以从apply_labels 中完全删除此代码，例如。 G。具有人类可读级别的因素。好吧，我很困惑，但它更清楚。但是，由于标准是一个逻辑向量，有没有办法只在 TRUE 值之间进行计算？从上面的示例中，在 case 被强制转换为因素后：tab_cols(set_var_lab((case == levels(case)[1]),"CASE 0") %nest% list(unvr(education))) 使用这样的代码，TRUE 和 FALSE 子表都会被计算。我们当然可以在之后exclude FALSE 表，但是 1）这意味着额外的代码行和 2）相同的计算时间。 @MaxenceDum。您可以使用

tab_subgroup(case == "desired_subgroup"). But nevertheless empty categories will appear in the table. There are special function to avoid them:

drop_empty_columns. The latter should be applied after tab_pivot` 选择数据。感谢您的反馈 àGregory Demin，至少我知道这是不可能的。正如您所解释的那样，我使用%nest% 选择来管理我自己的数据集，再加上一个具有专门为此需要创建的合并级别的虚拟因子，非常感谢您的提示！

以上是关于在单个 expss 表中添加和堆叠子组的主要内容，如果未能解决你的问题，请参考以下文章