R中多组数据的正态性检验
Posted
技术标签:
【中文标题】R中多组数据的正态性检验【英文标题】:Normality test for multi-grouped data in R 【发布时间】:2021-07-10 11:07:03 【问题描述】:我正在尝试对我在 R 中的数据进行正态性测试。我的数据集是由 4 列字符和 1 列具有数值的数据框。目前,我在 R 中使用 Rstatix 包,其他类型的统计测试运行良好,如 wilcox_test() 和 kruskal_test() 但是当我尝试运行 shapiro_test() 时它不起作用,出现以下错误:
data %>% group_by(treatment,chase,measure) %>% shapiro_test(value)
x
+-<error/dplyr:::mutate_error>
| Problem with `mutate()` input `data`.
| x Must group by variables found in `.data`.
| * Column `variable` is not found.
| i Input `data` is `map(.data$data, .f, ...)`.
\-<error/rlang_error>
Must group by variables found in `.data`.
* Column `variable` is not found.
Backtrace:
1. dplyr::group_by(., treatment, chase, measure)
2. rstatix::shapiro_test(., value)
33. rstatix:::.f(.x[[i]], ...)
11. dplyr::group_by(., variable)
43. dplyr::group_by_prepare(.data, ..., .add = .add)
我的数据集如下:
groups treatment chase measure value
1 uncoated control 30 colocA 17.912954
2 uncoated control 30 colocA 16.806409
3 uncoated control 30 colocA 20.322467
4 uncoated control 30 colocA 15.953959
5 uncoated control 30 colocA 22.566408
6 uncoated control 30 colocA 17.780975
7 uncoated control 30 colocA 19.764265
8 uncoated control 30 colocA 16.928500
9 uncoated control 30 colocA 22.931763
10 uncoated control 30 colocA 18.101085
11 uncoated control 30 distCC 1.159298
12 uncoated control 30 distCC 1.174931
13 uncoated control 30 distCC 1.190449
14 uncoated control 30 distCC 1.265717
15 uncoated control 30 distCC 1.103845
16 uncoated control 30 distCC 1.125344
17 uncoated control 30 distCC 1.290703
18 uncoated control 30 distCC 1.172462
19 uncoated control 30 distCC 1.065353
20 uncoated control 30 distCC 1.048523
21 coated control 30 colocA 6.062000
22 coated control 30 colocA 9.370714
23 coated control 30 colocA 12.898769
24 coated control 30 colocA 20.398458
25 coated control 30 colocA 11.174150
26 coated control 30 colocA 17.574250
27 coated control 30 colocA 12.481857
28 coated control 30 colocA 21.565250
29 coated control 30 colocA 21.743409
30 coated control 30 colocA 12.699600
31 coated control 30 distCC 4.317260
32 coated control 30 distCC 4.263914
33 coated control 30 distCC 5.136013
34 coated control 30 distCC 3.142906
35 coated control 30 distCC 2.617590
36 coated control 30 distCC 4.149614
37 coated control 30 distCC 4.995551
38 coated control 30 distCC 3.851803
39 coated control 30 distCC 4.606119
40 coated control 30 distCC 2.820326
提前谢谢你。
【问题讨论】:
【参考方案1】:这是stats::shapiro.test
的一种方式。
library(dplyr)
library(broom)
data %>%
group_by(treatment, chase, measure) %>%
do(tidy(shapiro.test(.$value)))
## A tibble: 2 x 6
## Groups: treatment, chase, measure [2]
# treatment chase measure statistic p.value method
# <chr> <int> <chr> <dbl> <dbl> <chr>
#1 control 30 colocA 0.940 0.244 Shapiro-Wilk normality test
#2 control 30 distCC 0.811 0.00128 Shapiro-Wilk normality test
【讨论】:
【参考方案2】:我们还可以将输出包装在 list
中 summarise
和 unnest
中
library(dplyr)
library(tidyr)
library(broom)
dat %>%
group_by(treatment, chase, measure) %>%
summarise(out = list(shapiro.test(value) %>% tidy), .groups = 'drop') %>%
unnest(c(out))
# A tibble: 2 x 6
# treatment chase measure statistic p.value method
# <chr> <int> <chr> <dbl> <dbl> <chr>
#1 control 30 colocA 0.940 0.244 Shapiro-Wilk normality test
#2 control 30 distCC 0.811 0.00128 Shapiro-Wilk normality test
数据
dat <- structure(list(groups = c("uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated",
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "coated",
"coated", "coated", "coated", "coated", "coated", "coated", "coated",
"coated", "coated", "coated", "coated", "coated", "coated", "coated",
"coated", "coated", "coated", "coated", "coated"), treatment = c("control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control", "control", "control", "control",
"control", "control", "control"), chase = c(30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), measure = c("colocA",
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA",
"colocA", "colocA", "distCC", "distCC", "distCC", "distCC", "distCC",
"distCC", "distCC", "distCC", "distCC", "distCC", "colocA", "colocA",
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA",
"colocA", "distCC", "distCC", "distCC", "distCC", "distCC", "distCC",
"distCC", "distCC", "distCC", "distCC"), value = c(17.912954,
16.806409, 20.322467, 15.953959, 22.566408, 17.780975, 19.764265,
16.9285, 22.931763, 18.101085, 1.159298, 1.174931, 1.190449,
1.265717, 1.103845, 1.125344, 1.290703, 1.172462, 1.065353, 1.048523,
6.062, 9.370714, 12.898769, 20.398458, 11.17415, 17.57425, 12.481857,
21.56525, 21.743409, 12.6996, 4.31726, 4.263914, 5.136013, 3.142906,
2.61759, 4.149614, 4.995551, 3.851803, 4.606119, 2.820326)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40"))
【讨论】:
以上是关于R中多组数据的正态性检验的主要内容,如果未能解决你的问题,请参考以下文章
R语言使用wilcox.test函数进行两组数据的Wilcoxon符号秩检验wilcox.test函数添加paired参数则为Wilcoxon signed rank,当t检验需要的正态性条件不满足