使用 count()、aggregate()、data.table() 或 dplyr() 汇总数据(均值、标准差)
Posted
技术标签:
【中文标题】使用 count()、aggregate()、data.table() 或 dplyr() 汇总数据(均值、标准差)【英文标题】:Using count(), aggregate(), data.table () or dplyr() to summarise the data (mean, standard deviation) 【发布时间】:2019-06-14 11:51:13 【问题描述】:概述
我有一个名为 "subset_leaf_1" 的数据集(见下文),显示了气候环境如何影响名为 "Quercus petraea 的特定橡树树种的树冠指数em>”。
我有一个名为 Urbanisation_index 的列(即下面的数据框),其中包含四个子级别(即 1、2、3 和 4)。每个子级 (1-4) 突出显示围绕“Quercus petraea”的城市化程度。
我还想计算 Urbanisation_index 的每个子级别的平均 Canopy_Index。
问题
我想使用 data.table()、aggregate() 或 按物种计算城市化指数的每个子级别的行数dplyr 包中的 count(),然后计算 Urbanisation_index 的每个子级别的平均 Canopy_index。
如果有人能提供帮助,我将不胜感激
想要的结果
R 代码:
首先,我对Quercus petraea
的数据进行了子集化set.seed(45L)
##Subset dataframe leaf_1 by"Quercus petraea"
subset_leaf_1<-subset(leaf_1, Species == "Quercus petraea")
#Produce new dataframe for the subsetted data (observation 1)
Subset_leaf_ob_1<-data.frame(subset_leaf_1, stringsAsFactors=TRUE)
dplyr()
library(dplyr)
#sum and count of species and urbanisation index
#Mean and standard deviation for Canopy_Index, per urbansiation level, per species
Summarised_leaf_1<-Subset_leaf_ob_1 %>%
count(Species, Urbanisation_index) %>%
summarise(Subset_leaf_ob_1, mean=mean(Canopy_Index), sd=sd(Canopy_Index))
#Error message
Error in summarise_impl(.data, dots) :
Column `Subset_leaf_ob_1` must be length 1 (a summary value), not 11
聚合()
我可以使用这两个等式来计算 Urbanisation_index 的每行计数,以及使用这两个等式计算每个 Urbanisation_index 子级别的 Canopy_Index 的平均值:
##Row count for Urbansiation_index
aggregate_subset_leaf_1<-aggregate(Obs_.no ~ Species + Urbanisation_index,
data = Subset_leaf_ob_1, FUN = length)
##Mean Canopy_Index per Urbanisation_index sublevel per speces
subset_leaf_1_canopy<-aggregate(Canopy_Index ~ Species*Urbanisation_index,
data = Subset_leaf_ob_1, FUN = mean)
为了结合 Urbanisation_index 的每行计数和每个子级别的平均 Canopy_index,我在下面应用了这个函数(上表)。但是,此函数将零添加到每行的计数,并且我无法重命名列标题以生成新的数据框。检查 R Studio 的 R 环境子部分后,Canopy_Index 的均值和标准差没有显示出来。
##Function to incorporate both counts of urbanisation index and the mean and standard deviation for canopy index
Mean_sd_Count_leaf_1<-aggregate(Canopy_Index ~ Species+Urbanisation_index,
data = Subset_leaf_ob_1,
FUN = function(x) c(Counts = length(x), Mean = mean(x), Sd = sd(x)))
##Rename the columns
colnames(Mean_sd_Count_leaf_1)<-c("Species", "Urbanisation_Index", "Counts", "Mean_Canopy_Index", "SD_Canopy_Index")
##Error message
Error in names(x) <- value :
'names' attribute [5] must be the same length as the vector [3]
traceback()
1: `colnames<-`(`*tmp*`, value = c("Species", "Urbanisation_Index",
"Counts", "Mean_Canopy_Index", "SD_Canopy_Index"))
data.table()
library(data.table)
Data.table.leaf.1<-data.table(Subset_leaf_ob_1)
leaf.1.data.table<-Data.table.leaf.1[, .N, by = list(Species, Urbanisation_index),
mean_test=rowMeans(Canopy_Index),
sd_test=rowMeans(Canopy_Index)]
##Error Message
Error in `[.data.table`(Data.table.leaf.1, , .N, by = list(Species, Urbanisation_index), :
unused arguments (mean_test = rowMeans(Canopy_Index), sd_test = rowMeans(Canopy_Index))
数据
structure(list(Obs_.no = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L,
36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L,
49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L,
62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L,
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L,
88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L,
101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L,
112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L,
123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L,
134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L,
145L, 146L, 147L, 148L, 149L, 150L, 151L, 152L, 153L, 154L, 155L,
156L, 157L, 158L, 159L, 160L, 161L, 162L, 163L, 164L, 165L, 166L,
167L, 168L, 169L, 170L, 171L, 172L, 173L, 174L, 175L, 176L, 177L,
178L, 179L, 180L, 181L, 182L, 183L, 184L, 185L, 186L, 187L, 188L,
189L, 190L, 191L, 192L, 193L, 194L, 195L, 196L, 197L, 198L, 199L,
200L, 201L, 202L, 203L, 204L, 205L, 206L, 207L, 208L, 209L, 210L,
211L, 212L, 213L, 214L, 215L, 216L, 217L, 218L, 219L, 220L, 221L,
222L, 223L, 224L, 225L, 226L, 227L, 228L, 229L, 230L, 231L, 232L,
233L, 234L, 235L, 236L, 237L, 238L, 239L, 240L, 241L, 242L, 243L,
244L, 246L, 247L, 248L, 249L, 250L, 251L, 252L, 253L, 254L, 255L,
256L, 257L, 258L, 259L, 260L, 261L, 262L, 263L, 264L, 265L, 266L,
267L, 268L, 269L, 270L, 271L, 272L, 273L, 274L, 275L, 276L, 277L,
278L, 279L, 280L, 281L, 282L, 283L, 284L, 285L, 286L, 287L, 288L,
289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L), Date_observed = structure(c(5L,
17L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 12L, 12L, 12L, 12L, 4L,
4L, 4L, 4L, 9L, 9L, 9L, 9L, 9L, 9L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 12L, 12L, 12L, 12L, 13L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 12L, 12L, 6L, 6L, 6L,
6L, 16L, 16L, 16L, 16L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 7L, 7L,
7L, 7L, 7L, 14L, 14L, 14L, 6L, 6L, 10L, 10L, 10L, 10L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 1L,
1L, 12L, 12L, 12L, 12L, 12L, 5L, 5L, 5L, 7L, 7L, 7L, 7L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 7L, 7L, 7L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 12L, 12L, 12L,
5L, 5L, 5L, 5L, 9L, 9L, 11L, 11L, 11L, 11L, 3L, 3L, 10L, 10L,
10L, 10L, 4L, 4L, 4L, 4L, 12L, 12L, 12L, 10L, 10L, 10L, 10L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 14L, 14L, 14L, 14L, 9L, 9L, 9L,
9L, 11L, 11L, 11L, 11L, 4L, 4L, 4L, 4L, 7L, 7L, 7L, 14L, 14L,
14L, 14L, 10L, 10L, 11L, 11L, 11L, 3L, 3L, 3L, 3L, 14L, 4L, 4L,
4L, 4L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 12L, 6L, 6L, 6L, 6L, 11L, 6L, 6L, 6L, 12L, 12L,
2L, 2L, 2L, 2L, 6L, 6L, 6L, 10L, 10L, 10L, 10L, 15L, 11L, 11L,
11L, 11L, 3L, 3L, 3L, 7L, 7L, 7L, 4L, 4L, 4L, 12L, 12L, 12L,
12L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 12L, 12L, 12L, 12L, 7L,
7L, 7L, 7L, 12L, 12L, 12L, 12L), .Label = c("10/1/18", "10/14/18",
"10/19/18", "10/20/18", "10/21/18", "10/22/18", "10/23/18", "10/24/18",
"10/25/18", "10/26/18", "10/27/18", "10/28/18", "10/28/19", "10/29/18",
"11/6/18", "12/9/18", "8/20/18"), class = "factor"), Latitude = c(51.4175,
52.12087, 52.0269, 52.0269, 52.0269, 52.0269, 52.947709, 52.947709,
52.947709, 52.947709, 53.14919, 53.14919, 55.94154, 55.94154,
51.59449, 51.59449, 51.59449, 51.59449, 51.491811, 51.491811,
52.59925, 52.59925, 52.59925, 52.59925, 51.60157, 51.60157, 51.60157,
51.60157, 52.6888, 52.6888, 52.6888, 52.6888, 50.697802, 50.697802,
50.697802, 50.697802, 53.62417, 50.446841, 50.446841, 50.446841,
50.446841, 35.292896, 35.292896, 53.959679, 53.959679, 53.959679,
53.959679, 32.2855, 32.2855, 32.2855, 32.2855, 52.01434, 52.01434,
52.01434, 50.8365, 50.8365, 51.78375, 51.78375, 51.78375, 51.78375,
51.456965, 51.456965, 51.456965, 51.456965, 51.3651, 51.3651,
51.3651, 51.3651, 52.01182, 52.01182, 52.01182, 52.01182, 55.919722,
50.114277, 50.114277, 50.114277, 50.114277, 53.39912, 53.39912,
53.39912, 51.43474, 51.43474, 51.10676, 51.10676, 51.10676, 51.10676,
50.435984, 50.435984, 50.435984, 50.435984, 51.78666, 51.78666,
51.78666, 51.78666, 51.473203, 51.473203, 51.473203, 53.38728,
53.38728, 53.38728, 53.38728, 52.441088, 52.441088, 52.552344,
19.61263, 19.61263, 19.61263, 19.61263, 53.582285, 53.582285,
53.582285, 49.259471, 49.259471, 49.259471, 49.259471, 50.461625,
50.461625, 50.461625, 50.461625, 51.746642, 51.746642, 51.746642,
51.746642, 52.2501, 52.2501, 52.2501, 52.2501, 52.423336, 52.423336,
52.423336, 52.423336, 50.79387, 50.79387, 50.79387, 53.615575,
53.615575, 53.615575, 53.615575, 52.55317, 52.55317, 52.55317,
52.55317, 51.08474, 51.08474, 51.08474, 53.19329, 53.19329, 53.19329,
53.19329, 55.96785, 55.96785, 56.52664, 56.52664, 56.52664, 56.52664,
52.04252, 52.04252, 51.8113, 51.8113, 51.8113, 51.8113, 52.580157,
52.580157, 52.580157, 52.580157, 51.5894, 51.5894, 51.5894, 50.52008,
50.52008, 50.52008, 50.52008, 25.3671, 25.3671, 25.3671, 25.3671,
51.48417, 51.48417, 51.48417, 51.48417, 54.58243, 54.58243, 54.58243,
54.58243, 52.58839, 52.58839, 52.58839, 52.58839, 52.717283,
52.717283, 52.717283, 52.717283, 50.740764, 50.740764, 50.740764,
50.740764, -36.865, -36.865, -36.865, 52.57937, 52.57937, 52.57937,
52.57937, 50.736531, 50.736531, 50.79926, 50.79926, 50.79926,
53.675996, 53.675996, 53.675996, 53.675996, 55.43828, 48.35079,
48.35079, 48.35079, 48.35079, 51.36445, 51.36445, 51.36445, 51.36445,
52.36286, 52.36286, 52.36286, -25.77831, -25.77831, -25.77831,
-25.77831, -20.112381, -20.112381, -20.112381, -20.112381, 52.122402,
52.122402, 52.122402, 51.481079, 52.16104, 52.16104, 52.16104,
52.16104, 54.7311, 51.61842, 51.61842, 51.61842, 55.91913, 55.91913,
51.06433, 51.06433, 51.06433, 51.06433, 55.920966, 55.920966,
55.920966, 51.6528, 51.6528, 51.6528, 51.6528, 57.158724, 51.88485,
51.88485, 51.88485, 51.88485, 52.34015, 52.34015, 52.34015, 50.615029,
50.615029, 50.615029, 53.37687, 53.37687, 53.37687, 54.27745,
54.27745, 54.27745, 54.27745, 52.026042, 52.026042, 52.026042,
52.026042, 51.319032, 51.319032, 51.319032, 51.319032, 51.51357,
51.51357, 51.51357, 51.51357, 53.43202, 53.43202, 53.43202, 53.43202,
51.50823, 51.50823, 51.50823, 51.50823), Longitude = c(-0.32118,
-0.29293, -0.7078, -0.7078, -0.7078, -0.7078, -1.435407, -1.435407,
-1.435407, -1.435407, -0.76115, -0.76115, -3.19139, -3.19139,
-2.98828, -2.98828, -2.98828, -2.98828, -3.210324, -3.210324,
1.33011, 1.33011, 1.33011, 1.33011, -3.67111, -3.67111, -3.67111,
-3.67111, -3.30909, -3.30909, -3.30909, -3.30909, -2.11692, -2.11692,
-2.11692, -2.11692, -2.43155, -3.706923, -3.706923, -3.706923,
-3.706923, 139.676727, 139.676727, -1.061008, -1.061008, -1.061008,
-1.061008, -110.9434, -110.9434, -110.9434, -110.9434, 1.04007,
1.04007, 1.04007, -0.1631, -0.1631, -0.65046, -0.65046, -0.65046,
-0.65046, -2.624917, -2.624917, -2.624917, -2.624917, 0.70706,
0.70706, 0.70706, 0.70706, -0.70082, -0.70082, -0.70082, -0.70082,
-3.210278, -5.541128, -5.541128, -5.541128, -5.541128, -2.33356,
-2.33356, -2.33356, 0.45981, 0.45981, -2.32071, -2.32071, -2.32071,
-2.32071, -4.105617, -4.105617, -4.105617, -4.105617, -0.71433,
-0.71433, -0.71433, -0.71433, -2.586492, -2.586492, -2.586492,
-2.95811, -2.95811, -2.95811, -2.95811, -0.176158, -0.176158,
-1.337177, 57.66801, 57.66801, 57.66801, 57.66801, -2.802239,
-2.802239, -2.802239, -123.107788, -123.107788, -123.107788,
-123.107788, 3.560973, 3.560973, 3.560973, 3.560973, 0.486416,
0.486416, 0.486416, 0.486416, -0.8825, -0.8825, -0.8825, -0.8825,
-1.787563, -1.787563, -1.787563, -1.787563, 0.26684, 0.26684,
0.26684, -2.432959, -2.432959, -2.432959, -2.432959, -0.20337,
-0.20337, -0.20337, -0.20337, -0.73645, -0.73645, -0.73645, -0.63793,
-0.63793, -0.63793, -0.63793, -3.18084, -3.18084, -3.40313, -3.40313,
-3.40313, -3.40313, -2.43733, -2.43733, -0.22894, -0.22894, -0.22894,
-0.22894, -1.948571, -1.948571, -1.948571, -1.948571, 0.1879,
0.1879, 0.1879, -4.20756, -4.20756, -4.20756, -4.20756, 51.53781,
51.53781, 51.53781, 51.53781, -0.34854, -0.34854, -0.34854, -0.34854,
-5.93229, -5.93229, -5.93229, -5.93229, -1.96843, -1.96843, -1.96843,
-1.96843, -2.410575, -2.410575, -2.410575, -2.410575, -2.361234,
-2.361234, -2.361234, -2.361234, 174.757, 174.757, 174.757, -1.89325,
-1.89325, -1.89325, -1.89325, -2.011143, -2.011143, -3.19446,
-3.19446, -3.19446, -1.272824, -1.272824, -1.272824, -1.272824,
-4.64226, 10.91812, 10.91812, 10.91812, 10.91812, -0.23106, -0.23106,
-0.23106, -0.23106, -2.06327, -2.06327, -2.06327, 28.22357, 28.22357,
28.22357, 28.22357, 57.580207, 57.580207, 57.580207, 57.580207,
-0.487443, -0.487443, -0.487443, -0.026923, 0.18702, 0.18702,
0.18702, 0.18702, -5.8041, -0.16034, -0.16034, -0.16034, -3.20987,
-3.20987, -1.79923, -1.79923, -1.79923, -1.79923, -3.193503,
-3.193503, -3.193503, -1.57361, -1.57361, -1.57361, -1.57361,
-2.166099, -0.17844, -0.17844, -0.17844, -0.17844, -1.27795,
-1.27795, -1.27795, -1.966392, -1.966392, -1.966392, -1.34506,
-1.34506, -1.34506, -0.47911, -0.47911, -0.47911, -0.47911, -0.503114,
-0.503114, -0.503114, -0.503114, -0.472994, -0.472994, -0.472994,
-0.472994, -3.18738, -3.18738, -3.18738, -3.18738, -2.27968,
-2.27968, -2.27968, -2.27968, -0.25847, -0.25847, -0.25847, -0.25847
), Altitude = c(5L, 0L, 68L, 68L, 68L, 68L, 104L, 104L, 104L,
104L, 11L, 11L, 0L, 0L, 7L, 7L, 7L, 7L, 15L, 15L, 23L, 23L, 23L,
23L, 184L, 184L, 184L, 184L, 176L, 176L, 176L, 176L, 12L, 12L,
12L, 12L, 178L, 36L, 36L, 36L, 36L, 0L, 0L, 11L, 11L, 11L, 11L,
718L, 718L, 718L, 718L, 47L, 47L, 47L, 42L, 42L, 210L, 210L,
210L, 210L, 97L, 97L, 97L, 97L, 23L, 23L, 23L, 23L, 0L, 0L, 0L,
0L, 110L, 9L, 9L, 9L, 9L, 30L, 30L, 30L, 4L, 4L, 200L, 200L,
200L, 200L, 160L, 160L, 160L, 160L, 166L, 166L, 166L, 166L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 74L, 74L, 74L, 74L, 36L,
36L, 36L, 47L, 47L, 47L, 47L, 58L, 58L, 58L, 58L, 43L, 43L, 43L,
43L, 97L, 97L, 97L, 97L, 133L, 133L, 133L, 133L, 18L, 18L, 18L,
123L, 123L, 123L, 123L, 5L, 5L, 5L, 5L, 128L, 128L, 128L, 15L,
15L, 15L, 15L, 14L, 14L, 65L, 65L, 65L, 65L, 45L, 45L, 129L,
129L, 129L, 129L, 140L, 140L, 140L, 140L, 0L, 0L, 0L, 18L, 18L,
18L, 18L, 0L, 0L, 0L, 0L, 30L, 30L, 30L, 30L, 19L, 19L, 19L,
19L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 96L, 96L, 96L, 96L, 88L,
88L, 88L, 169L, 169L, 169L, 169L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 123L, 123L, 123L,
1436L, 1436L, 1436L, 1436L, 0L, 0L, 0L, 0L, 43L, 43L, 43L, 6L,
75L, 75L, 75L, 75L, 0L, 73L, 73L, 73L, 109L, 109L, 0L, 0L, 0L,
0L, 115L, 115L, 115L, 110L, 110L, 110L, 110L, 119L, 95L, 95L,
95L, 95L, 112L, 112L, 112L, 23L, 23L, 23L, 34L, 34L, 34L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 24L, 24L, 24L, 24L, 38L, 38L, 38L,
38L, 29L, 29L, 29L, 29L, 20L, 20L, 20L, 20L), Species = structure(c(6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 1L, 1L, 6L, 6L, 6L, 6L, 1L, 1L,
1L, 1L, 5L, 5L, 5L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 5L, 5L, 1L, 1L, 1L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L, 5L, 1L, 1L, 1L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 1L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 3L, 3L, 3L, 3L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L,
6L, 5L, 6L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 1L, 6L, 5L, 6L, 5L, 5L,
5L, 5L, 5L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L), .Label = c("other deciduous tree", "other oak",
"other plant", "other shrub", "Quercus petraea", "Quercus robur"
), class = "factor"), Tree_diameter = c(68.8, 10, 98.5, 97, 32.5,
45.1, 847, 817, 569, 892, 57.3, 43.5, 120, 180, 74, 67, 69, 55,
62, 71, 140, 111.4, 114.6, 167.1, 29, 46.5, 27.7, 40.1, 68, 45,
60, 54, 104, 122, 85, 71, 81, 39.8, 43.6, 44.6, 22.6, 160, 156,
20.1, 17.8, 15.6, 12.1, 37.3, 45.1, 42.8, 51.2, 48.1, 83.7, 77.9,
80.2, 84.7, 81.8, 102.5, 75.5, 57.3, 0.3, 0.2, 0.3, 0.3, 70,
36, 53, 44, 31.5, 27.1, 23.3, 22, 85, 69.4, 37.3, 82.9, 52.9,
98.4, 64.6, 81.8, 19.9, 14.6, 196, 122, 118, 180, 58.6, 54.1,
58, 61.5, 58.4, 40.6, 61, 68.6, 44.2, 45.2, 44.2, 117, 240, 210,
310, 134, 64, 52.2, 32, 25, 22, 17, 57, 73.9, 37.1, 170, 114,
127, 158, 147.4, 135.3, 122.9, 104.1, 263, 237, 322, 302, 175,
182, 141, 155, 89, 41, 70, 83, 81.5, 29.3, 43.3, 141, 86.5, 82,
114.5, 57, 42, 58, 64, 129, 127, 143, 125, 92, 68, 90, 24.5,
20.1, 63.7, 39.8, 66.2, 112.4, 41.9, 43.8, 124.5, 94.1, 68.6,
74.4, 23.6, 27.7, 22.9, 25.2, 59.2, 78, 79.3, 24.2, 54.7, 43,
33.1, 56, 67, 62, 58, 306, 274, 56, 60, 72.5, 128.5, 22, 16,
143, 103, 53, 130, 48.4, 69.8, 6.4, 18.6, 129.2, 41.7, 57.6,
14, 75, 105, 44, 41.7, 30.2, 39.5, 24.2, 320, 352, 120.9, 108.3,
53.2, 240, 274, 122, 85, 21, 52, 43, 38, 37, 219, 215, 216, 175,
124, 133, 119, 39.2, 63, 94.9, 47.1, 126.6, 86.9, 94.7, 106.2,
85.9, 49.7, 97.1, 55, 40.8, 79.3, 62.4, 62.4, 70, 115.9, 111.1,
88.9, 80.3, 90.8, 36, 31, 37.5, 42.3, 73, 54, 75, 43, 50.3, 28.7,
31.9, 159, 181.5, 149.7, 122, 143.6, 148, 145, 99, 47, 76.4,
62.7, 49, 57.9, 54.8, 53.5, 88.8, 71.3, 101.9, 28, 32, 54, 54,
169, 152, 160, 138, 90.8, 87.9, 77.4, 81.2, 91.7, 62.7, 50, 72.9,
23.7, 58, 80.7, 73.7), Urbanisation_index = c(2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L,
4L, 4L, 2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L,
2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 3L, 2L, 2L, 2L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 1L, 1L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 4L, 4L, 3L, 2L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L), Stand_density_index = c(3L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 4L, 1L,
1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 3L, 3L,
2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 4L, 4L,
3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 4L,
4L, 3L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 2L, 2L, 2L,
4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 2L, 1L, 1L, 4L, 4L,
2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 1L, 1L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L), Canopy_Index = c(85L, 85L, 85L, 75L, 45L, 25L, 75L, 65L,
65L, 75L, 65L, 15L, 75L, 85L, 85L, 45L, 45L, 65L, 75L, 75L, 95L,
95L, 95L, 95L, 95L, 55L, 85L, 65L, 85L, 65L, 95L, 85L, 85L, 85L,
75L, 75L, 65L, 85L, 85L, 85L, 85L, 65L, 35L, 75L, 75L, 85L, 65L,
55L, 65L, 45L, 45L, 95L, 85L, 85L, 85L, 65L, 95L, 85L, 95L, 95L,
75L, 75L, 85L, 85L, 85L, 85L, 85L, 75L, 85L, 85L, 85L, 85L, 45L,
75L, 75L, 65L, 75L, 35L, 35L, 75L, 85L, 85L, 65L, 75L, 85L, 75L,
95L, 95L, 95L, 95L, 75L, 75L, 65L, 65L, 85L, 95L, 95L, 35L, 75L,
65L, 85L, 95L, 95L, 55L, 75L, 75L, 75L, 85L, 65L, 95L, 75L, 75L,
65L, 75L, 65L, 85L, 95L, 95L, 75L, 95L, 75L, 95L, 65L, 75L, 75L,
85L, 85L, 65L, 95L, 65L, 65L, 75L, 75L, 65L, 65L, 65L, 65L, 65L,
35L, 65L, 75L, 35L, 85L, 85L, 75L, 95L, 85L, 85L, 75L, 45L, 55L,
35L, 35L, 25L, 25L, 75L, 65L, 95L, 85L, 75L, 85L, 85L, 75L, 75L,
65L, 95L, 95L, 95L, 75L, 85L, 65L, 45L, 75L, 35L, 65L, 95L, 95L,
95L, 95L, 95L, 65L, 75L, 45L, 35L, 75L, 95L, 95L, 85L, 75L, 65L,
85L, 95L, 75L, 85L, 85L, 95L, 95L, 95L, 55L, 65L, 65L, 45L, 65L,
85L, 35L, 95L, 85L, 85L, 75L, 85L, 95L, 85L, 95L, 75L, 65L, 65L,
65L, 65L, 55L, 75L, 85L, 85L, 85L, 85L, 55L, 25L, 55L, 65L, 35L,
75L, 25L, 35L, 85L, 95L, 85L, 55L, 75L, 75L, 75L, 75L, 65L, 85L,
75L, 65L, 85L, 55L, 95L, 95L, 95L, 95L, 45L, 55L, 35L, 65L, 45L,
75L, 75L, 55L, 65L, 65L, 75L, 65L, 95L, 95L, 95L, 45L, 15L, 85L,
65L, 95L, 95L, 45L, 65L, 45L, 55L, 85L, 65L, 75L, 75L, 75L, 65L,
75L, 35L, 75L, 75L, 75L, 75L, 25L, 45L, 45L, 35L, 85L, 95L, 85L,
95L), Phenological_Index = c(2L, 4L, 2L, 2L, 4L, 4L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 2L, 3L, 3L, 4L, 3L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 4L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 2L, 2L, 2L, 2L, 3L,
1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 2L, 1L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
4L, 4L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 3L,
3L, 3L, 4L, 3L, 2L, 3L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 4L,
2L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 2L,
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 1L, 3L, 4L, 3L, 3L,
2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L,
1L, 1L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 2L, 3L, 2L, 3L, 2L, 2L, 2L,
2L, 3L, 3L, 4L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L)), class = "data.frame", row.names = c(NA,
-295L))
【问题讨论】:
【参考方案1】:使用dplyr
,我们可以首先使用filter
和Species
,然后对于每个Urbanisation_index
,我们使用n()
和mean
计算Canopy_Index
的观察次数。
library(dplyr)
subset_leaf_1 %>%
filter(Species == "Quercus petraea") %>%
group_by(Urbanisation_index) %>%
summarise(Species = "Quercus petraea",
Obs_no = n(),
Canopy_Index = mean(Canopy_Index))
# Urbanisation_index Species Obs_no Canopy_Index
# <int> <chr> <int> <dbl>
#1 1 Quercus petraea 6 61.7
#2 2 Quercus petraea 17 75
#3 3 Quercus petraea 14 76.4
#4 4 Quercus petraea 17 72.1
我们也可以在base R中做到这一点
df1 <- do.call(data.frame, aggregate(Canopy_Index~Urbanisation_index,
subset(subset_leaf_1, Species == "Quercus petraea"),
function(x) c(Canopy_Index = mean(x), Obs_no = length(x))))
colnames(df1) <- c("Urbanisation_index", "Canopy_Index", "Obs_no")
【讨论】:
嗨,Ronak Shah,非常感谢您的帮助。一个问题:使用 Base R 中的函数,特别是聚合()。之后如何使用 colnames() 重命名列?当我尝试时,我不断收到此错误消息:名称错误(x) @AliceHobbs 谢谢..我想我们需要do.call
。我更新了答案。请检查它现在是否有效。
嘿 Ronak,我尝试了新代码,它返回此错误消息“do.call 中的错误(Subset_leaf_ob_1,聚合(Canopy_Index ~ Species + : 'what' must be a function or string'.你知道这意味着什么吗?谢谢你的帮助:)
@AliceHobbs 它适用于提供的示例数据框,所以我不确定可能出了什么问题。你检查dplyr
答案了吗?这行得通吗?
嗨 Ronak,我一直在使用 dplyr() 搜索此错误消息。但是,我只是查看了对象 df1 的结构。这些就是结果。 str(leaf.aggregate.1) 'data.frame': 4 obs。 3 个变量:$ 物种:因子 w/ 6 级“其他落叶树”,..:5 5 5 5 $ Urbanisation_index:int 1 2 3 4 $ Canopy_Index:num [1:4, 1:3] 6 17 14 17 61.7 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "Obs_no" "Mean" "SD"【参考方案2】:
使用data.table
,我们将'data.frame'转换为'data.table'(setDT
),在i
中指定逻辑条件来对行进行子集化,按'Urbanisation_index'分组,得到数字行数 (.N
) 和“Canopy_Index”的mean
以及“物种”的first
值
library(data.table)
out <- setDT(subset_leaf_1)[Species == "Quercus petraea",
.(Species = first(Species),
Obs_no = .N,
Canopy_Index = mean(Canopy_Index)), by = Urbanisation_index]
setcolorder(out, c(2, 1, 3, 4))
out
# Species Urbanisation_index Obs_no Canopy_Index
#1: Quercus petraea 2 17 75.00000
#2: Quercus petraea 4 17 72.05882
#3: Quercus petraea 3 14 76.42857
#4: Quercus petraea 1 6 61.66667
这也可以在base R
完成
tmp1 <- subset(subset_leaf_1, Species == "Quercus petraea")
by(tmp1, tmp1$Urbanisation_index, FUN = function(x)
data.frame(Obs_no = nrow(x), Canopy_Index = mean(x$Canopy_Index)))
【讨论】:
以上是关于使用 count()、aggregate()、data.table() 或 dplyr() 汇总数据(均值、标准差)的主要内容,如果未能解决你的问题,请参考以下文章
不能在 Group by/Order by/Where/ON 子句中使用 Group 或 Aggregate 函数(min()、max()、sum()、count()、...等)
django数据查询优化annotate和aggregate