R中多组数据的正态性检验

Posted

技术标签:

【中文标题】R中多组数据的正态性检验【英文标题】:Normality test for multi-grouped data in R 【发布时间】:2021-07-10 11:07:03 【问题描述】:

我正在尝试对我在 R 中的数据进行正态性测试。我的数据集是由 4 列字符和 1 列具有数值的数据框。目前,我在 R 中使用 Rstatix 包,其他类型的统计测试运行良好,如 wilcox_test() 和 kruskal_test() 但是当我尝试运行 shapiro_test() 时它不起作用,出现以下错误:

data %>% group_by(treatment,chase,measure) %>% shapiro_test(value)
x
+-<error/dplyr:::mutate_error>
| Problem with `mutate()` input `data`.
| x Must group by variables found in `.data`.
| * Column `variable` is not found.
| i Input `data` is `map(.data$data, .f, ...)`.
\-<error/rlang_error>
  Must group by variables found in `.data`.
  * Column `variable` is not found.
Backtrace:
  1. dplyr::group_by(., treatment, chase, measure)
  2. rstatix::shapiro_test(., value)
 33. rstatix:::.f(.x[[i]], ...)
 11. dplyr::group_by(., variable)
 43. dplyr::group_by_prepare(.data, ..., .add = .add)

我的数据集如下:

    groups treatment chase measure     value
1   uncoated   control    30  colocA 17.912954
2   uncoated   control    30  colocA 16.806409
3   uncoated   control    30  colocA 20.322467
4   uncoated   control    30  colocA 15.953959
5   uncoated   control    30  colocA 22.566408
6   uncoated   control    30  colocA 17.780975
7   uncoated   control    30  colocA 19.764265
8   uncoated   control    30  colocA 16.928500
9   uncoated   control    30  colocA 22.931763
10  uncoated   control    30  colocA 18.101085
11  uncoated   control    30  distCC  1.159298
12  uncoated   control    30  distCC  1.174931
13  uncoated   control    30  distCC  1.190449
14  uncoated   control    30  distCC  1.265717
15  uncoated   control    30  distCC  1.103845
16  uncoated   control    30  distCC  1.125344
17  uncoated   control    30  distCC  1.290703
18  uncoated   control    30  distCC  1.172462
19  uncoated   control    30  distCC  1.065353
20  uncoated   control    30  distCC  1.048523
21    coated   control    30  colocA  6.062000
22    coated   control    30  colocA  9.370714
23    coated   control    30  colocA 12.898769
24    coated   control    30  colocA 20.398458
25    coated   control    30  colocA 11.174150
26    coated   control    30  colocA 17.574250
27    coated   control    30  colocA 12.481857
28    coated   control    30  colocA 21.565250
29    coated   control    30  colocA 21.743409
30    coated   control    30  colocA 12.699600
31    coated   control    30  distCC  4.317260
32    coated   control    30  distCC  4.263914
33    coated   control    30  distCC  5.136013
34    coated   control    30  distCC  3.142906
35    coated   control    30  distCC  2.617590
36    coated   control    30  distCC  4.149614
37    coated   control    30  distCC  4.995551
38    coated   control    30  distCC  3.851803
39    coated   control    30  distCC  4.606119
40    coated   control    30  distCC  2.820326

提前谢谢你。

【问题讨论】:

【参考方案1】:

这是stats::shapiro.test的一种方式。

library(dplyr)
library(broom)

data %>%
  group_by(treatment, chase, measure) %>% 
  do(tidy(shapiro.test(.$value)))
## A tibble: 2 x 6
## Groups:   treatment, chase, measure [2]
#  treatment chase measure statistic p.value method                     
#  <chr>     <int> <chr>       <dbl>   <dbl> <chr>                      
#1 control      30 colocA      0.940 0.244   Shapiro-Wilk normality test
#2 control      30 distCC      0.811 0.00128 Shapiro-Wilk normality test

【讨论】:

【参考方案2】:

我们还可以将输出包装在 listsummariseunnest

library(dplyr)
library(tidyr)
library(broom)
dat %>% 
    group_by(treatment, chase, measure) %>%
    summarise(out = list(shapiro.test(value) %>% tidy), .groups = 'drop') %>%
    unnest(c(out))
# A tibble: 2 x 6
#  treatment chase measure statistic p.value method                     
#  <chr>     <int> <chr>       <dbl>   <dbl> <chr>                      
#1 control      30 colocA      0.940 0.244   Shapiro-Wilk normality test
#2 control      30 distCC      0.811 0.00128 Shapiro-Wilk normality test
 

数据

dat <- structure(list(groups = c("uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "uncoated", 
"uncoated", "uncoated", "uncoated", "uncoated", "uncoated", "coated", 
"coated", "coated", "coated", "coated", "coated", "coated", "coated", 
"coated", "coated", "coated", "coated", "coated", "coated", "coated", 
"coated", "coated", "coated", "coated", "coated"), treatment = c("control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control", "control", "control", "control", 
"control", "control", "control"), chase = c(30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), measure = c("colocA", 
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA", 
"colocA", "colocA", "distCC", "distCC", "distCC", "distCC", "distCC", 
"distCC", "distCC", "distCC", "distCC", "distCC", "colocA", "colocA", 
"colocA", "colocA", "colocA", "colocA", "colocA", "colocA", "colocA", 
"colocA", "distCC", "distCC", "distCC", "distCC", "distCC", "distCC", 
"distCC", "distCC", "distCC", "distCC"), value = c(17.912954, 
16.806409, 20.322467, 15.953959, 22.566408, 17.780975, 19.764265, 
16.9285, 22.931763, 18.101085, 1.159298, 1.174931, 1.190449, 
1.265717, 1.103845, 1.125344, 1.290703, 1.172462, 1.065353, 1.048523, 
6.062, 9.370714, 12.898769, 20.398458, 11.17415, 17.57425, 12.481857, 
21.56525, 21.743409, 12.6996, 4.31726, 4.263914, 5.136013, 3.142906, 
2.61759, 4.149614, 4.995551, 3.851803, 4.606119, 2.820326)),
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40"))

【讨论】:

以上是关于R中多组数据的正态性检验的主要内容,如果未能解决你的问题,请参考以下文章

数据的正态性检验

R语言-数据的正态性检验

多项式回归的正态性检验

R语言之正态性检验

R语言使用wilcox.test函数进行两组数据的Wilcoxon符号秩检验wilcox.test函数添加paired参数则为Wilcoxon signed rank,当t检验需要的正态性条件不满足

如果计算相对拒绝频率,如何衡量与显着性水平是不是显着不同? (R中的正态性检验)