R中数据框中的嵌套子集

Posted

技术标签:

【中文标题】R中数据框中的嵌套子集【英文标题】:Nested subsetting in a dataframe in R 【发布时间】:2022-01-14 21:45:33 【问题描述】:

我想知道如何在下面对我的data 进行子集化,这样我最终会得到 4 个studyies,其中包括:

(A) 2 个独特的 studyies,其中 study_type==standard 包括 1 个 studyreporting==subscale,1 个 studyreporting==composite类似于研究 1 和3)

(B) 2 个独特的 studyies,其中 study_type==alternative 包括 1 个 studyreporting==subscale,1 个 studyreporting==composite。(类似研究 5和 7)

这在 R 中可能吗?

m="
study subscale  reporting  obs include yi   vi         study_type
1        A      subscale   1   yes     1.94 0.33503768 standard
1        A      subscale   2   yes     1.06 0.01076604 standard
2        A      subscale   3   yes     2.41 0.23767389 standard
2        A      subscale   4   yes     2.34 0.37539841 standard
3        A&C    composite  5   yes     3.09 0.31349510 standard
3        A&C    composite  6   yes     3.99 0.01349510 standard
4        A&B    composite  7   yes     2.90 0.91349510 standard
4        A&B    composite  8   yes     3.01 0.99349510 standard
5        G&H    composite  9   yes     1.01 0.99910197 alternative
5        G&H    composite  10  yes     2.10 0.97910095 alternative
6        E&G    composite  11  yes     0.11 0.27912095 alternative
6        E&G    composite  12  yes     3.12 0.87910095 alternative
7        E      subscale   13  yes     0.08 0.21670360 alternative
7        G      subscale   14  yes     1.00 0.91597190 alternative
8        F      subscale   15  yes     1.08 0.81670360 alternative
8        E      subscale   16  yes     0.99 0.91297170 alternative"
data <- read.table(text=m,h=T)

【问题讨论】:

【参考方案1】:

如果我理解正确,您可以使用 dplyr::distinct


library(tidyverse)

data %>%
  distinct(study_type, reporting, .keep_all = TRUE)
#>   study subscale reporting obs include   yi        vi  study_type
#> 1     1        A  subscale   1     yes 1.94 0.3350377    standard
#> 2     3      A&C composite   5     yes 3.09 0.3134951    standard
#> 3     5      G&H composite   9     yes 1.01 0.9991020 alternative
#> 4     7        E  subscale  13     yes 0.08 0.2167036 alternative

【讨论】:

【参考方案2】:

如果您询问如何将数据过滤到您询问的子集中,您可以这样做:

> study1 <- dplyr::filter(data, study_type == "standard" & reporting == "subscale")
> study1
  study subscale reporting obs include   yi         vi study_type
1     1        A  subscale   1     yes 1.94 0.33503768   standard
2     1        A  subscale   2     yes 1.06 0.01076604   standard
3     2        A  subscale   3     yes 2.41 0.23767389   standard
4     2        A  subscale   4     yes 2.34 0.37539841   standard
> study2 <- dplyr::filter(data, study_type == "standard" & reporting == "composite")
> study2
  study subscale reporting obs include   yi        vi study_type
1     3      A&C composite   5     yes 3.09 0.3134951   standard
2     3      A&C composite   6     yes 3.99 0.0134951   standard
3     4      A&B composite   7     yes 2.90 0.9134951   standard
4     4      A&B composite   8     yes 3.01 0.9934951   standard
> study3 <- dplyr::filter(data, study_type == "alternative" & reporting == "subscale")
> study3
  study subscale reporting obs include   yi        vi  study_type
1     7        E  subscale  13     yes 0.08 0.2167036 alternative
2     7        G  subscale  14     yes 1.00 0.9159719 alternative
3     8        F  subscale  15     yes 1.08 0.8167036 alternative
4     8        E  subscale  16     yes 0.99 0.9129717 alternative
> study4 <- dplyr::filter(data, study_type == "alternative" & reporting == "composite")
> study4
  study subscale reporting obs include   yi       vi  study_type
1     5      G&H composite   9     yes 1.01 0.999102 alternative
2     5      G&H composite  10     yes 2.10 0.979101 alternative
3     6      E&G composite  11     yes 0.11 0.279121 alternative
4     6      E&G composite  12     yes 3.12 0.879101 alternative

【讨论】:

检查split(df, df[c("reporting", "study_type")])...

以上是关于R中数据框中的嵌套子集的主要内容,如果未能解决你的问题,请参考以下文章

R语言学习:提取R对象的子集

R子集嵌套列表,选择多个条目

R:在整个数据框中搜索单独的单词以创建子集

循环子集,获取文件并将结果保存在数据框中

R语言中怎么提取时间序列数据框中符合行名称的子集

如何将函数应用于增加数据框中的数据子集