使用具有非连续行的函数 diff 进行聚合
Posted
技术标签:
【中文标题】使用具有非连续行的函数 diff 进行聚合【英文标题】:Aggregating using function diff with non-sequential rows 【发布时间】:2020-07-18 23:06:19 【问题描述】:我对 r 比较陌生,我正在自学如何使用它,所以希望我能很好地解释我的问题。
在我的数据中有 4 列:
1. Code=Location of a plot
2. Event= Pre or Post. Refers to whether the year of sampling was before or after a disturbance
3. Season= The season the sampling was done in
4. Total= Number of individuals found in plot
我想汇总数据,以便每个地点和季节都有一行,其中包含火灾前后的变化总数。
我希望始终在前后计算更改,而在我的数据中它并不总是按那个顺序。
我有什么:
Code Event Season Total
A Post AUTUMN 2
A Pre AUTUMN 5
A Pre SUMMER 15
A Post SUMMER 40
B Pre AUTUMN 5
B Post AUTUMN 8
我想要什么:
Code Season Change
A AUTUMN 3
A SUMMER -25
B AUTUMN -3
【问题讨论】:
【参考方案1】:我们可以在按“代码”和“季节”分组后的“总”上使用diff
aggregate(cbind(Change = Total) ~ Code + Season, df1, diff)
或dplyr
library(dplyr)
df1 %>%
group_by(Code, Season) %>%
summarise(Change = Total[Event == "Pre"] - Total[Event == "Post"])
# A tibble: 3 x 3
# Groups: Code [2]
# Code Season Change
# <chr> <chr> <int>
#1 A AUTUMN 3
#2 A SUMMER -25
#3 B AUTUMN -3
或使用data.table
library(data.table)
setDT(df1)[, .(Change = Total[Event == 'Pre'] - Total[Event == 'Post']), .(Code, Season)]
数据
df1 <- structure(list(Code = c("A", "A", "A", "A", "B", "B"), Event = c("Post",
"Pre", "Pre", "Post", "Pre", "Post"), Season = c("AUTUMN", "AUTUMN",
"SUMMER", "SUMMER", "AUTUMN", "AUTUMN"), Total = c(2L, 5L, 15L,
40L, 5L, 8L)), class = "data.frame", row.names = c(NA, -6L))
【讨论】:
【参考方案2】:这是一个基本的 R 选项
dfout <- aggregate(Change~Code + Season,
transform(df,Change = Total*ifelse(Event=="Post",-1,1)),
sum)
给了
> dfout
Code Season Change
1 A AUTUMN 3
2 B AUTUMN -3
3 A SUMMER -25
数据
df <- structure(list(Code = c("A", "A", "A", "A", "B", "B"), Event = c("Post",
"Pre", "Pre", "Post", "Pre", "Post"), Season = c("AUTUMN", "AUTUMN",
"SUMMER", "SUMMER", "AUTUMN", "AUTUMN"), Total = c(2L, 5L, 15L,
40L, 5L, 8L)), class = "data.frame", row.names = c(NA, -6L))
【讨论】:
以上是关于使用具有非连续行的函数 diff 进行聚合的主要内容,如果未能解决你的问题,请参考以下文章