使用具有非连续行的函数 diff 进行聚合

Posted

技术标签:

【中文标题】使用具有非连续行的函数 diff 进行聚合【英文标题】:Aggregating using function diff with non-sequential rows 【发布时间】:2020-07-18 23:06:19 【问题描述】:

我对 r 比较陌生,我正在自学如何使用它,所以希望我能很好地解释我的问题。

在我的数据中有 4 列:

1. Code=Location of a plot
2. Event= Pre or Post. Refers to whether the year of sampling was before or after a disturbance
3. Season= The season the sampling was done in
4. Total= Number of individuals found in plot

我想汇总数据,以便每个地点和季节都有一行,其中包含火灾前后的变化总数。

我希望始终在前后计算更改,而在我的数据中它并不总是按那个顺序。

我有什么:

Code   Event Season Total
A      Post  AUTUMN     2
A      Pre   AUTUMN     5
A      Pre   SUMMER    15
A      Post  SUMMER    40
B      Pre   AUTUMN     5
B      Post  AUTUMN     8

我想要什么:

Code   Season   Change
A      AUTUMN        3
A      SUMMER      -25
B      AUTUMN       -3

【问题讨论】:

【参考方案1】:

我们可以在按“代码”和“季节”分组后的“总”上使用diff

aggregate(cbind(Change = Total) ~ Code + Season, df1, diff)

dplyr

library(dplyr)
df1 %>%
   group_by(Code, Season) %>%
   summarise(Change = Total[Event == "Pre"] - Total[Event == "Post"])
# A tibble: 3 x 3
# Groups:   Code [2]
#  Code  Season Change
#  <chr> <chr>   <int>
#1 A     AUTUMN      3
#2 A     SUMMER    -25
#3 B     AUTUMN     -3

或使用data.table

library(data.table)
setDT(df1)[, .(Change = Total[Event == 'Pre'] - Total[Event == 'Post']), .(Code, Season)]

数据

df1 <- structure(list(Code = c("A", "A", "A", "A", "B", "B"), Event = c("Post", 
"Pre", "Pre", "Post", "Pre", "Post"), Season = c("AUTUMN", "AUTUMN", 
"SUMMER", "SUMMER", "AUTUMN", "AUTUMN"), Total = c(2L, 5L, 15L, 
40L, 5L, 8L)), class = "data.frame", row.names = c(NA, -6L))

【讨论】:

【参考方案2】:

这是一个基本的 R 选项

dfout <- aggregate(Change~Code + Season,
                   transform(df,Change = Total*ifelse(Event=="Post",-1,1)),
                   sum)

给了

> dfout
  Code Season Change
1    A AUTUMN      3
2    B AUTUMN     -3
3    A SUMMER    -25

数据

df <- structure(list(Code = c("A", "A", "A", "A", "B", "B"), Event = c("Post", 
"Pre", "Pre", "Post", "Pre", "Post"), Season = c("AUTUMN", "AUTUMN", 
"SUMMER", "SUMMER", "AUTUMN", "AUTUMN"), Total = c(2L, 5L, 15L, 
40L, 5L, 8L)), class = "data.frame", row.names = c(NA, -6L))

【讨论】:

以上是关于使用具有非连续行的函数 diff 进行聚合的主要内容,如果未能解决你的问题,请参考以下文章

识别 Pandas 中的非连续行

SQL:检测具有相同键的连续行的连续块

matlab几个特殊符号@...+~

聚合 SQL 中的连续行

pandas-对列中具有相同值的连续行进行分组和聚合

pandas:将具有相同值的连续行分组为一组