使用 count()、aggregate()、data.table() 或 dplyr() 汇总数据(均值、标准差)

Posted

技术标签:

【中文标题】使用 count()、aggregate()、data.table() 或 dplyr() 汇总数据(均值、标准差)【英文标题】:Using count(), aggregate(), data.table () or dplyr() to summarise the data (mean, standard deviation) 【发布时间】:2019-06-14 11:51:13 【问题描述】:

概述

我有一个名为 "subset_leaf_1" 的数据集(见下文),显示了气候环境如何影响名为 "Quercus petraea 的特定橡树树种的树冠指数em>”。

我有一个名为 Urbanisation_index 的列(即下面的数据框),其中包含四个子级别(即 1、2、3 和 4)。每个子级 (1-4) 突出显示围绕“Quercus petraea”的城市化程度。

我还想计算 Urbanisation_index 的每个子级别的平均 Canopy_Index

问题

我想使用 data.table()aggregate() 按物种计算城市化指数的每个子级别的行数dplyr 包中的 count(),然后计算 Urbanisation_index 的每个子级别的平均 Canopy_index

如果有人能提供帮助,我将不胜感激

想要的结果

R 代码:

首先,我对Quercus petraea

的数据进行了子集化
set.seed(45L)

##Subset dataframe leaf_1 by"Quercus petraea"
subset_leaf_1<-subset(leaf_1, Species == "Quercus petraea")

#Produce new dataframe for the subsetted data (observation 1)
Subset_leaf_ob_1<-data.frame(subset_leaf_1, stringsAsFactors=TRUE)

dplyr()

library(dplyr)

#sum and count of species and urbanisation index
#Mean and standard deviation for Canopy_Index, per urbansiation level, per species

Summarised_leaf_1<-Subset_leaf_ob_1  %>% 
                             count(Species, Urbanisation_index) %>% 
                             summarise(Subset_leaf_ob_1, mean=mean(Canopy_Index), sd=sd(Canopy_Index))

#Error message

Error in summarise_impl(.data, dots) : 
Column `Subset_leaf_ob_1` must be length 1 (a summary value), not 11

聚合()

我可以使用这两个等式来计算 Urbanisation_index 的每行计数,以及使用这两个等式计算每个 Urbanisation_index 子级别的 Canopy_Index 的平均值:

##Row count for Urbansiation_index 
aggregate_subset_leaf_1<-aggregate(Obs_.no ~ Species + Urbanisation_index, 
                               data = Subset_leaf_ob_1, FUN = length)

##Mean Canopy_Index per Urbanisation_index sublevel per speces
  subset_leaf_1_canopy<-aggregate(Canopy_Index ~ Species*Urbanisation_index, 
                                           data = Subset_leaf_ob_1, FUN = mean)

为了结合 Urbanisation_index 的每行计数和每个子级别的平均 Canopy_index,我在下面应用了这个函数(上表)。但是,此函数将零添加到每行的计数,并且我无法重命名列标题以生成新的数据框。检查 R Studio 的 R 环境子部分后,Canopy_Index 的均值和标准差没有显示出来。

##Function to incorporate both counts of urbanisation index and the mean and standard deviation for canopy index
Mean_sd_Count_leaf_1<-aggregate(Canopy_Index ~ Species+Urbanisation_index, 
                            data = Subset_leaf_ob_1, 
                            FUN = function(x) c(Counts = length(x), Mean = mean(x), Sd = sd(x)))

##Rename the columns
colnames(Mean_sd_Count_leaf_1)<-c("Species", "Urbanisation_Index", "Counts", "Mean_Canopy_Index", "SD_Canopy_Index")

##Error message

Error in names(x) <- value : 
  'names' attribute [5] must be the same length as the vector [3]

traceback()

 1: `colnames<-`(`*tmp*`, value = c("Species", "Urbanisation_Index", 
   "Counts", "Mean_Canopy_Index", "SD_Canopy_Index"))

data.table()

   library(data.table)

Data.table.leaf.1<-data.table(Subset_leaf_ob_1)

leaf.1.data.table<-Data.table.leaf.1[, .N, by = list(Species, Urbanisation_index), 
                                           mean_test=rowMeans(Canopy_Index),
                                           sd_test=rowMeans(Canopy_Index)] 

##Error Message

Error in `[.data.table`(Data.table.leaf.1, , .N, by = list(Species, Urbanisation_index),  : 
  unused arguments (mean_test = rowMeans(Canopy_Index), sd_test = rowMeans(Canopy_Index))

数据

structure(list(Obs_.no = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 
62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 
88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 
101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 
112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 
123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L, 
134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 
145L, 146L, 147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L, 155L, 
156L, 157L, 158L, 159L, 160L, 161L, 162L, 163L, 164L, 165L, 166L, 
167L, 168L, 169L, 170L, 171L, 172L, 173L, 174L, 175L, 176L, 177L, 
178L, 179L, 180L, 181L, 182L, 183L, 184L, 185L, 186L, 187L, 188L, 
189L, 190L, 191L, 192L, 193L, 194L, 195L, 196L, 197L, 198L, 199L, 
200L, 201L, 202L, 203L, 204L, 205L, 206L, 207L, 208L, 209L, 210L, 
211L, 212L, 213L, 214L, 215L, 216L, 217L, 218L, 219L, 220L, 221L, 
222L, 223L, 224L, 225L, 226L, 227L, 228L, 229L, 230L, 231L, 232L, 
233L, 234L, 235L, 236L, 237L, 238L, 239L, 240L, 241L, 242L, 243L, 
244L, 246L, 247L, 248L, 249L, 250L, 251L, 252L, 253L, 254L, 255L, 
256L, 257L, 258L, 259L, 260L, 261L, 262L, 263L, 264L, 265L, 266L, 
267L, 268L, 269L, 270L, 271L, 272L, 273L, 274L, 275L, 276L, 277L, 
278L, 279L, 280L, 281L, 282L, 283L, 284L, 285L, 286L, 287L, 288L, 
289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L), Date_observed = structure(c(5L, 
17L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 12L, 12L, 12L, 12L, 4L, 
4L, 4L, 4L, 9L, 9L, 9L, 9L, 9L, 9L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 12L, 12L, 12L, 12L, 13L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 12L, 12L, 6L, 6L, 6L, 
6L, 16L, 16L, 16L, 16L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 7L, 7L, 
7L, 7L, 7L, 14L, 14L, 14L, 6L, 6L, 10L, 10L, 10L, 10L, 4L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 1L, 
1L, 12L, 12L, 12L, 12L, 12L, 5L, 5L, 5L, 7L, 7L, 7L, 7L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 7L, 7L, 7L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 12L, 12L, 12L, 
5L, 5L, 5L, 5L, 9L, 9L, 11L, 11L, 11L, 11L, 3L, 3L, 10L, 10L, 
10L, 10L, 4L, 4L, 4L, 4L, 12L, 12L, 12L, 10L, 10L, 10L, 10L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 14L, 14L, 14L, 14L, 9L, 9L, 9L, 
9L, 11L, 11L, 11L, 11L, 4L, 4L, 4L, 4L, 7L, 7L, 7L, 14L, 14L, 
14L, 14L, 10L, 10L, 11L, 11L, 11L, 3L, 3L, 3L, 3L, 14L, 4L, 4L, 
4L, 4L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 5L, 5L, 5L, 12L, 6L, 6L, 6L, 6L, 11L, 6L, 6L, 6L, 12L, 12L, 
2L, 2L, 2L, 2L, 6L, 6L, 6L, 10L, 10L, 10L, 10L, 15L, 11L, 11L, 
11L, 11L, 3L, 3L, 3L, 7L, 7L, 7L, 4L, 4L, 4L, 12L, 12L, 12L, 
12L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 12L, 12L, 12L, 12L, 7L, 
7L, 7L, 7L, 12L, 12L, 12L, 12L), .Label = c("10/1/18", "10/14/18", 
"10/19/18", "10/20/18", "10/21/18", "10/22/18", "10/23/18", "10/24/18", 
"10/25/18", "10/26/18", "10/27/18", "10/28/18", "10/28/19", "10/29/18", 
"11/6/18", "12/9/18", "8/20/18"), class = "factor"), Latitude = c(51.4175, 
52.12087, 52.0269, 52.0269, 52.0269, 52.0269, 52.947709, 52.947709, 
52.947709, 52.947709, 53.14919, 53.14919, 55.94154, 55.94154, 
51.59449, 51.59449, 51.59449, 51.59449, 51.491811, 51.491811, 
52.59925, 52.59925, 52.59925, 52.59925, 51.60157, 51.60157, 51.60157, 
51.60157, 52.6888, 52.6888, 52.6888, 52.6888, 50.697802, 50.697802, 
50.697802, 50.697802, 53.62417, 50.446841, 50.446841, 50.446841, 
50.446841, 35.292896, 35.292896, 53.959679, 53.959679, 53.959679, 
53.959679, 32.2855, 32.2855, 32.2855, 32.2855, 52.01434, 52.01434, 
52.01434, 50.8365, 50.8365, 51.78375, 51.78375, 51.78375, 51.78375, 
51.456965, 51.456965, 51.456965, 51.456965, 51.3651, 51.3651, 
51.3651, 51.3651, 52.01182, 52.01182, 52.01182, 52.01182, 55.919722, 
50.114277, 50.114277, 50.114277, 50.114277, 53.39912, 53.39912, 
53.39912, 51.43474, 51.43474, 51.10676, 51.10676, 51.10676, 51.10676, 
50.435984, 50.435984, 50.435984, 50.435984, 51.78666, 51.78666, 
51.78666, 51.78666, 51.473203, 51.473203, 51.473203, 53.38728, 
53.38728, 53.38728, 53.38728, 52.441088, 52.441088, 52.552344, 
19.61263, 19.61263, 19.61263, 19.61263, 53.582285, 53.582285, 
53.582285, 49.259471, 49.259471, 49.259471, 49.259471, 50.461625, 
50.461625, 50.461625, 50.461625, 51.746642, 51.746642, 51.746642, 
51.746642, 52.2501, 52.2501, 52.2501, 52.2501, 52.423336, 52.423336, 
52.423336, 52.423336, 50.79387, 50.79387, 50.79387, 53.615575, 
53.615575, 53.615575, 53.615575, 52.55317, 52.55317, 52.55317, 
52.55317, 51.08474, 51.08474, 51.08474, 53.19329, 53.19329, 53.19329, 
53.19329, 55.96785, 55.96785, 56.52664, 56.52664, 56.52664, 56.52664, 
52.04252, 52.04252, 51.8113, 51.8113, 51.8113, 51.8113, 52.580157, 
52.580157, 52.580157, 52.580157, 51.5894, 51.5894, 51.5894, 50.52008, 
50.52008, 50.52008, 50.52008, 25.3671, 25.3671, 25.3671, 25.3671, 
51.48417, 51.48417, 51.48417, 51.48417, 54.58243, 54.58243, 54.58243, 
54.58243, 52.58839, 52.58839, 52.58839, 52.58839, 52.717283, 
52.717283, 52.717283, 52.717283, 50.740764, 50.740764, 50.740764, 
50.740764, -36.865, -36.865, -36.865, 52.57937, 52.57937, 52.57937, 
52.57937, 50.736531, 50.736531, 50.79926, 50.79926, 50.79926, 
53.675996, 53.675996, 53.675996, 53.675996, 55.43828, 48.35079, 
48.35079, 48.35079, 48.35079, 51.36445, 51.36445, 51.36445, 51.36445, 
52.36286, 52.36286, 52.36286, -25.77831, -25.77831, -25.77831, 
-25.77831, -20.112381, -20.112381, -20.112381, -20.112381, 52.122402, 
52.122402, 52.122402, 51.481079, 52.16104, 52.16104, 52.16104, 
52.16104, 54.7311, 51.61842, 51.61842, 51.61842, 55.91913, 55.91913, 
51.06433, 51.06433, 51.06433, 51.06433, 55.920966, 55.920966, 
55.920966, 51.6528, 51.6528, 51.6528, 51.6528, 57.158724, 51.88485, 
51.88485, 51.88485, 51.88485, 52.34015, 52.34015, 52.34015, 50.615029, 
50.615029, 50.615029, 53.37687, 53.37687, 53.37687, 54.27745, 
54.27745, 54.27745, 54.27745, 52.026042, 52.026042, 52.026042, 
52.026042, 51.319032, 51.319032, 51.319032, 51.319032, 51.51357, 
51.51357, 51.51357, 51.51357, 53.43202, 53.43202, 53.43202, 53.43202, 
51.50823, 51.50823, 51.50823, 51.50823), Longitude = c(-0.32118, 
-0.29293, -0.7078, -0.7078, -0.7078, -0.7078, -1.435407, -1.435407, 
-1.435407, -1.435407, -0.76115, -0.76115, -3.19139, -3.19139, 
-2.98828, -2.98828, -2.98828, -2.98828, -3.210324, -3.210324, 
1.33011, 1.33011, 1.33011, 1.33011, -3.67111, -3.67111, -3.67111, 
-3.67111, -3.30909, -3.30909, -3.30909, -3.30909, -2.11692, -2.11692, 
-2.11692, -2.11692, -2.43155, -3.706923, -3.706923, -3.706923, 
-3.706923, 139.676727, 139.676727, -1.061008, -1.061008, -1.061008, 
-1.061008, -110.9434, -110.9434, -110.9434, -110.9434, 1.04007, 
1.04007, 1.04007, -0.1631, -0.1631, -0.65046, -0.65046, -0.65046, 
-0.65046, -2.624917, -2.624917, -2.624917, -2.624917, 0.70706, 
0.70706, 0.70706, 0.70706, -0.70082, -0.70082, -0.70082, -0.70082, 
-3.210278, -5.541128, -5.541128, -5.541128, -5.541128, -2.33356, 
-2.33356, -2.33356, 0.45981, 0.45981, -2.32071, -2.32071, -2.32071, 
-2.32071, -4.105617, -4.105617, -4.105617, -4.105617, -0.71433, 
-0.71433, -0.71433, -0.71433, -2.586492, -2.586492, -2.586492, 
-2.95811, -2.95811, -2.95811, -2.95811, -0.176158, -0.176158, 
-1.337177, 57.66801, 57.66801, 57.66801, 57.66801, -2.802239, 
-2.802239, -2.802239, -123.107788, -123.107788, -123.107788, 
-123.107788, 3.560973, 3.560973, 3.560973, 3.560973, 0.486416, 
0.486416, 0.486416, 0.486416, -0.8825, -0.8825, -0.8825, -0.8825, 
-1.787563, -1.787563, -1.787563, -1.787563, 0.26684, 0.26684, 
0.26684, -2.432959, -2.432959, -2.432959, -2.432959, -0.20337, 
-0.20337, -0.20337, -0.20337, -0.73645, -0.73645, -0.73645, -0.63793, 
-0.63793, -0.63793, -0.63793, -3.18084, -3.18084, -3.40313, -3.40313, 
-3.40313, -3.40313, -2.43733, -2.43733, -0.22894, -0.22894, -0.22894, 
-0.22894, -1.948571, -1.948571, -1.948571, -1.948571, 0.1879, 
0.1879, 0.1879, -4.20756, -4.20756, -4.20756, -4.20756, 51.53781, 
51.53781, 51.53781, 51.53781, -0.34854, -0.34854, -0.34854, -0.34854, 
-5.93229, -5.93229, -5.93229, -5.93229, -1.96843, -1.96843, -1.96843, 
-1.96843, -2.410575, -2.410575, -2.410575, -2.410575, -2.361234, 
-2.361234, -2.361234, -2.361234, 174.757, 174.757, 174.757, -1.89325, 
-1.89325, -1.89325, -1.89325, -2.011143, -2.011143, -3.19446, 
-3.19446, -3.19446, -1.272824, -1.272824, -1.272824, -1.272824, 
-4.64226, 10.91812, 10.91812, 10.91812, 10.91812, -0.23106, -0.23106, 
-0.23106, -0.23106, -2.06327, -2.06327, -2.06327, 28.22357, 28.22357, 
28.22357, 28.22357, 57.580207, 57.580207, 57.580207, 57.580207, 
-0.487443, -0.487443, -0.487443, -0.026923, 0.18702, 0.18702, 
0.18702, 0.18702, -5.8041, -0.16034, -0.16034, -0.16034, -3.20987, 
-3.20987, -1.79923, -1.79923, -1.79923, -1.79923, -3.193503, 
-3.193503, -3.193503, -1.57361, -1.57361, -1.57361, -1.57361, 
-2.166099, -0.17844, -0.17844, -0.17844, -0.17844, -1.27795, 
-1.27795, -1.27795, -1.966392, -1.966392, -1.966392, -1.34506, 
-1.34506, -1.34506, -0.47911, -0.47911, -0.47911, -0.47911, -0.503114, 
-0.503114, -0.503114, -0.503114, -0.472994, -0.472994, -0.472994, 
-0.472994, -3.18738, -3.18738, -3.18738, -3.18738, -2.27968, 
-2.27968, -2.27968, -2.27968, -0.25847, -0.25847, -0.25847, -0.25847
), Altitude = c(5L, 0L, 68L, 68L, 68L, 68L, 104L, 104L, 104L, 
104L, 11L, 11L, 0L, 0L, 7L, 7L, 7L, 7L, 15L, 15L, 23L, 23L, 23L, 
23L, 184L, 184L, 184L, 184L, 176L, 176L, 176L, 176L, 12L, 12L, 
12L, 12L, 178L, 36L, 36L, 36L, 36L, 0L, 0L, 11L, 11L, 11L, 11L, 
718L, 718L, 718L, 718L, 47L, 47L, 47L, 42L, 42L, 210L, 210L, 
210L, 210L, 97L, 97L, 97L, 97L, 23L, 23L, 23L, 23L, 0L, 0L, 0L, 
0L, 110L, 9L, 9L, 9L, 9L, 30L, 30L, 30L, 4L, 4L, 200L, 200L, 
200L, 200L, 160L, 160L, 160L, 160L, 166L, 166L, 166L, 166L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 74L, 74L, 74L, 74L, 36L, 
36L, 36L, 47L, 47L, 47L, 47L, 58L, 58L, 58L, 58L, 43L, 43L, 43L, 
43L, 97L, 97L, 97L, 97L, 133L, 133L, 133L, 133L, 18L, 18L, 18L, 
123L, 123L, 123L, 123L, 5L, 5L, 5L, 5L, 128L, 128L, 128L, 15L, 
15L, 15L, 15L, 14L, 14L, 65L, 65L, 65L, 65L, 45L, 45L, 129L, 
129L, 129L, 129L, 140L, 140L, 140L, 140L, 0L, 0L, 0L, 18L, 18L, 
18L, 18L, 0L, 0L, 0L, 0L, 30L, 30L, 30L, 30L, 19L, 19L, 19L, 
19L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 96L, 96L, 96L, 96L, 88L, 
88L, 88L, 169L, 169L, 169L, 169L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 123L, 123L, 123L, 
1436L, 1436L, 1436L, 1436L, 0L, 0L, 0L, 0L, 43L, 43L, 43L, 6L, 
75L, 75L, 75L, 75L, 0L, 73L, 73L, 73L, 109L, 109L, 0L, 0L, 0L, 
0L, 115L, 115L, 115L, 110L, 110L, 110L, 110L, 119L, 95L, 95L, 
95L, 95L, 112L, 112L, 112L, 23L, 23L, 23L, 34L, 34L, 34L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 24L, 24L, 24L, 24L, 38L, 38L, 38L, 
38L, 29L, 29L, 29L, 29L, 20L, 20L, 20L, 20L), Species = structure(c(6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 1L, 1L, 6L, 6L, 6L, 6L, 1L, 1L, 
1L, 1L, 5L, 5L, 5L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 5L, 5L, 1L, 1L, 1L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L, 5L, 1L, 1L, 1L, 
5L, 5L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 1L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 3L, 3L, 3L, 3L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 
6L, 5L, 6L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 1L, 6L, 5L, 6L, 5L, 5L, 
5L, 5L, 5L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L), .Label = c("other deciduous tree", "other oak", 
"other plant", "other shrub", "Quercus petraea", "Quercus robur"
), class = "factor"), Tree_diameter = c(68.8, 10, 98.5, 97, 32.5, 
45.1, 847, 817, 569, 892, 57.3, 43.5, 120, 180, 74, 67, 69, 55, 
62, 71, 140, 111.4, 114.6, 167.1, 29, 46.5, 27.7, 40.1, 68, 45, 
60, 54, 104, 122, 85, 71, 81, 39.8, 43.6, 44.6, 22.6, 160, 156, 
20.1, 17.8, 15.6, 12.1, 37.3, 45.1, 42.8, 51.2, 48.1, 83.7, 77.9, 
80.2, 84.7, 81.8, 102.5, 75.5, 57.3, 0.3, 0.2, 0.3, 0.3, 70, 
36, 53, 44, 31.5, 27.1, 23.3, 22, 85, 69.4, 37.3, 82.9, 52.9, 
98.4, 64.6, 81.8, 19.9, 14.6, 196, 122, 118, 180, 58.6, 54.1, 
58, 61.5, 58.4, 40.6, 61, 68.6, 44.2, 45.2, 44.2, 117, 240, 210, 
310, 134, 64, 52.2, 32, 25, 22, 17, 57, 73.9, 37.1, 170, 114, 
127, 158, 147.4, 135.3, 122.9, 104.1, 263, 237, 322, 302, 175, 
182, 141, 155, 89, 41, 70, 83, 81.5, 29.3, 43.3, 141, 86.5, 82, 
114.5, 57, 42, 58, 64, 129, 127, 143, 125, 92, 68, 90, 24.5, 
20.1, 63.7, 39.8, 66.2, 112.4, 41.9, 43.8, 124.5, 94.1, 68.6, 
74.4, 23.6, 27.7, 22.9, 25.2, 59.2, 78, 79.3, 24.2, 54.7, 43, 
33.1, 56, 67, 62, 58, 306, 274, 56, 60, 72.5, 128.5, 22, 16, 
143, 103, 53, 130, 48.4, 69.8, 6.4, 18.6, 129.2, 41.7, 57.6, 
14, 75, 105, 44, 41.7, 30.2, 39.5, 24.2, 320, 352, 120.9, 108.3, 
53.2, 240, 274, 122, 85, 21, 52, 43, 38, 37, 219, 215, 216, 175, 
124, 133, 119, 39.2, 63, 94.9, 47.1, 126.6, 86.9, 94.7, 106.2, 
85.9, 49.7, 97.1, 55, 40.8, 79.3, 62.4, 62.4, 70, 115.9, 111.1, 
88.9, 80.3, 90.8, 36, 31, 37.5, 42.3, 73, 54, 75, 43, 50.3, 28.7, 
31.9, 159, 181.5, 149.7, 122, 143.6, 148, 145, 99, 47, 76.4, 
62.7, 49, 57.9, 54.8, 53.5, 88.8, 71.3, 101.9, 28, 32, 54, 54, 
169, 152, 160, 138, 90.8, 87.9, 77.4, 81.2, 91.7, 62.7, 50, 72.9, 
23.7, 58, 80.7, 73.7), Urbanisation_index = c(2L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 
4L, 4L, 2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 
2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 3L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 
4L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 4L, 
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 1L, 1L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 4L, 4L, 3L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L), Stand_density_index = c(3L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 4L, 1L, 
1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 
2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 4L, 4L, 
3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 4L, 
4L, 3L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 
4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 
3L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 2L, 1L, 1L, 4L, 4L, 
2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 1L, 1L, 2L, 
1L, 1L, 1L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 
2L), Canopy_Index = c(85L, 85L, 85L, 75L, 45L, 25L, 75L, 65L, 
65L, 75L, 65L, 15L, 75L, 85L, 85L, 45L, 45L, 65L, 75L, 75L, 95L, 
95L, 95L, 95L, 95L, 55L, 85L, 65L, 85L, 65L, 95L, 85L, 85L, 85L, 
75L, 75L, 65L, 85L, 85L, 85L, 85L, 65L, 35L, 75L, 75L, 85L, 65L, 
55L, 65L, 45L, 45L, 95L, 85L, 85L, 85L, 65L, 95L, 85L, 95L, 95L, 
75L, 75L, 85L, 85L, 85L, 85L, 85L, 75L, 85L, 85L, 85L, 85L, 45L, 
75L, 75L, 65L, 75L, 35L, 35L, 75L, 85L, 85L, 65L, 75L, 85L, 75L, 
95L, 95L, 95L, 95L, 75L, 75L, 65L, 65L, 85L, 95L, 95L, 35L, 75L, 
65L, 85L, 95L, 95L, 55L, 75L, 75L, 75L, 85L, 65L, 95L, 75L, 75L, 
65L, 75L, 65L, 85L, 95L, 95L, 75L, 95L, 75L, 95L, 65L, 75L, 75L, 
85L, 85L, 65L, 95L, 65L, 65L, 75L, 75L, 65L, 65L, 65L, 65L, 65L, 
35L, 65L, 75L, 35L, 85L, 85L, 75L, 95L, 85L, 85L, 75L, 45L, 55L, 
35L, 35L, 25L, 25L, 75L, 65L, 95L, 85L, 75L, 85L, 85L, 75L, 75L, 
65L, 95L, 95L, 95L, 75L, 85L, 65L, 45L, 75L, 35L, 65L, 95L, 95L, 
95L, 95L, 95L, 65L, 75L, 45L, 35L, 75L, 95L, 95L, 85L, 75L, 65L, 
85L, 95L, 75L, 85L, 85L, 95L, 95L, 95L, 55L, 65L, 65L, 45L, 65L, 
85L, 35L, 95L, 85L, 85L, 75L, 85L, 95L, 85L, 95L, 75L, 65L, 65L, 
65L, 65L, 55L, 75L, 85L, 85L, 85L, 85L, 55L, 25L, 55L, 65L, 35L, 
75L, 25L, 35L, 85L, 95L, 85L, 55L, 75L, 75L, 75L, 75L, 65L, 85L, 
75L, 65L, 85L, 55L, 95L, 95L, 95L, 95L, 45L, 55L, 35L, 65L, 45L, 
75L, 75L, 55L, 65L, 65L, 75L, 65L, 95L, 95L, 95L, 45L, 15L, 85L, 
65L, 95L, 95L, 45L, 65L, 45L, 55L, 85L, 65L, 75L, 75L, 75L, 65L, 
75L, 35L, 75L, 75L, 75L, 75L, 25L, 45L, 45L, 35L, 85L, 95L, 85L, 
95L), Phenological_Index = c(2L, 4L, 2L, 2L, 4L, 4L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 4L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 4L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 2L, 2L, 2L, 2L, 3L, 
1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 2L, 1L, 4L, 4L, 1L, 
1L, 1L, 1L, 1L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
4L, 4L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 3L, 
3L, 3L, 4L, 3L, 2L, 3L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 4L, 
2L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 2L, 
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 1L, 3L, 4L, 3L, 3L, 
2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 
1L, 1L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 
2L, 3L, 3L, 4L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L)), class = "data.frame", row.names = c(NA, 
-295L))

【问题讨论】:

【参考方案1】:

使用dplyr,我们可以首先使用filterSpecies,然后对于每个Urbanisation_index,我们使用n()mean 计算Canopy_Index 的观察次数。

library(dplyr)

subset_leaf_1 %>%
   filter(Species == "Quercus petraea") %>%
   group_by(Urbanisation_index) %>%
   summarise(Species = "Quercus petraea",
             Obs_no = n(),
             Canopy_Index = mean(Canopy_Index))


#  Urbanisation_index Species         Obs_no Canopy_Index
#               <int> <chr>            <int>        <dbl>
#1                  1 Quercus petraea      6         61.7
#2                  2 Quercus petraea     17         75  
#3                  3 Quercus petraea     14         76.4
#4                  4 Quercus petraea     17         72.1

我们也可以在base R中做到这一点

df1 <- do.call(data.frame, aggregate(Canopy_Index~Urbanisation_index, 
             subset(subset_leaf_1, Species == "Quercus petraea"),
             function(x) c(Canopy_Index = mean(x), Obs_no = length(x))))

colnames(df1) <- c("Urbanisation_index", "Canopy_Index", "Obs_no")

【讨论】:

嗨,Ronak Shah,非常感谢您的帮助。一个问题:使用 Base R 中的函数,特别是聚合()。之后如何使用 colnames() 重命名列?当我尝试时,我不断收到此错误消息:名称错误(x) @AliceHobbs 谢谢..我想我们需要do.call。我更新了答案。请检查它现在是否有效。 嘿 Ronak,我尝试了新代码,它返回此错误消息“do.call 中的错误(Subset_leaf_ob_1,聚合(Canopy_Index ~ Species + : 'what' must be a function or string'.你知道这意味着什么吗?谢谢你的帮助:) @AliceHobbs 它适用于提供的示例数据框,所以我不确定可能出了什么问题。你检查dplyr 答案了吗?这行得通吗? 嗨 Ronak,我一直在使用 dplyr() 搜索此错误消息。但是,我只是查看了对象 df1 的结构。这些就是结果。 str(leaf.aggregate.1) 'data.frame': 4 obs。 3 个变量:$ 物种:因子 w/ 6 级“其他落叶树”,..:5 5 5 5 $ Urbanisation_index:int 1 2 3 4 $ Canopy_Index:num [1:4, 1:3] 6 17 14 17 61.7 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "Obs_no" "Mean" "SD"【参考方案2】:

使用data.table,我们将'data.frame'转换为'data.table'(setDT),在i中指定逻辑条件来对行进行子集化,按'Urbanisation_index'分组,得到数字行数 (.N) 和“Canopy_Index”的mean 以及“物种”的first

library(data.table)
out <- setDT(subset_leaf_1)[Species == "Quercus petraea", 
        .(Species = first(Species),
          Obs_no = .N,
         Canopy_Index = mean(Canopy_Index)), by = Urbanisation_index]
setcolorder(out, c(2, 1, 3, 4))
out
#           Species Urbanisation_index Obs_no Canopy_Index
#1: Quercus petraea                  2     17     75.00000
#2: Quercus petraea                  4     17     72.05882
#3: Quercus petraea                  3     14     76.42857
#4: Quercus petraea                  1      6     61.66667

这也可以在base R完成

tmp1 <- subset(subset_leaf_1, Species == "Quercus petraea")
by(tmp1, tmp1$Urbanisation_index, FUN = function(x) 
   data.frame(Obs_no = nrow(x), Canopy_Index = mean(x$Canopy_Index)))

【讨论】:

以上是关于使用 count()、aggregate()、data.table() 或 dplyr() 汇总数据(均值、标准差)的主要内容,如果未能解决你的问题,请参考以下文章

不能在 Group by/Order by/Where/ON 子句中使用 Group 或 Aggregate 函数(min()、max()、sum()、count()、...等)

使用aggregate在MongoDB中查找重复的数据记录

django数据查询优化annotate和aggregate

MongoDB 聚合管道(aggregate)

Mongotemplate聚合java spring GroupBy

MongoDB 聚合