R - 计算 R - 面板数据中多个变量、多年和多个品牌的 4 周滚动窗口的平均值
Posted
技术标签:
【中文标题】R - 计算 R - 面板数据中多个变量、多年和多个品牌的 4 周滚动窗口的平均值【英文标题】:R - Calculating mean over rolling window of 4 weeks for multiple variables, over multiple years, and over multiple brands in R - panel data 【发布时间】:2021-08-31 03:59:55 【问题描述】:我拥有多个品牌多年的面板数据集。每个品牌都有 52 周观察和几个数字列,我想得到 4 周的平均值(所以第 1 周得到第 1 周 4 周的平均值,第 2 周得到第 2 周 5 周的平均值,等等。这意味着一年中每个品牌的 52 周中的最后 3 周没有得到任何值,这很好)。
我在底部放了一些以简单方式复制数据的代码。
我已经尝试编写一些函数,但还差得很远。问题是该功能应该区分品牌和年份,并考虑周数,我无法弄清楚....
提前非常感谢!
what I would like - simplified
#rm(list = ls())
week <- seq(from=1, to=52, by=1)
col1 <- seq(from=5, to= 5*52, by=5)
col2 <- seq(from=10, to= 10*52, by=10)
df <- as.data.frame(cbind(week, col1 ,col2))
df2 <- df
df$brand <- "brand a"
df2$brand <- "brand b"
df <- rbind(df, df2)
rm(week, col1, col2, df2)
df2 <- df
df2 <- df
df$year <- "2019"
df2$year <- "2020"
df <- rbind(df, df2)
rm(df2)
【问题讨论】:
您绝不能在示例代码rm(list = ls())
中包含此行。试图帮助您的人可能会因此丢失自己的数据...我在您的代码中对其进行了注释...另外,它是 R-inferno 的第 9 圈中的一个文字示例 (burns-stat.com/pages/Tutor/R_inferno.pdf)
好的,谢谢你的提示。不会再包括它。至于书,下次先查一下。我不知道它的存在。祝你有美好的一天!
【参考方案1】:
这是使用tidyverse 函数和zoo 包中的rollmean
函数的解决方案:
# Load libraries
library(tidyverse)
library(zoo)
# Create data
df <- data.frame(week = rep(seq(from = 1, to = 52, by = 1), times = 4),
col1 = seq(from = 5, to = 5 * 52 * 4, by = 5),
col2 = seq(from = 10, to = 10 * 52 * 4, by = 10),
brand = rep(c("brand a", "brand b",
"brand a", "brand b"),
each = 52),
year = rep(2018:2019, each = 104))
# Group, calculate rolling means, ungroup
df2 <- df %>%
group_by(year, brand) %>%
mutate(rolling_col1 = rollmean(x = col1, k = 4, fill = NA, align = "left"),
rolling_col2 = rollmean(x = col2, k = 4, fill = NA, align = "left")) %>%
ungroup()
df2
# A tibble: 208 x 7
# week col1 col2 brand year rolling_col1 rolling_col2
# <dbl> <dbl> <dbl> <chr> <int> <dbl> <dbl>
# 1 1 5 10 brand a 2018 12.5 25
# 2 2 10 20 brand a 2018 17.5 35
# 3 3 15 30 brand a 2018 22.5 45
# 4 4 20 40 brand a 2018 27.5 55
# 5 5 25 50 brand a 2018 32.5 65
# 6 6 30 60 brand a 2018 37.5 75
# 7 7 35 70 brand a 2018 42.5 85
# 8 8 40 80 brand a 2018 47.5 95
# 9 9 45 90 brand a 2018 52.5 105
#10 10 50 100 brand a 2018 57.5 115
编辑
要用滚动方式替换 col1/col2 中的值,而不是添加额外的列,您可以使用:
# Group and calculate rolling means
df2 <- df %>%
group_by(year, brand) %>%
mutate(across(.cols = starts_with("col"),
.fns = ~ rollmean(x = .x, k = 4, fill = NA, align = "left"))) %>%
ungroup()
【讨论】:
非常感谢!这很好用,显然我很接近哈哈。试着给你加一个,但我还是太菜鸟给你....祝你有美好的一天! 不用担心。如果解决了您的问题,请accept this answer。【参考方案2】:您也可以使用slider
库。使用@jared_mamrot 博士共享的数据
df <- data.frame(week = rep(seq(from = 1, to = 52, by = 1), times = 4),
col1 = seq(from = 5, to = 5 * 52 * 4, by = 5),
col2 = seq(from = 10, to = 10 * 52 * 4, by = 10),
brand = rep(c("brand a", "brand b",
"brand a", "brand b"),
each = 52),
year = rep(2018:2019, each = 104))
library(dplyr, warn.conflicts = FALSE)
library(slider)
df %>% group_by(brand, year) %>%
mutate(roll_col1 = slide_index_mean(col1, week, after = 3, complete = TRUE),
roll_col2 = slide_index_mean(col2, week, after = 3, complete = TRUE))
#> # A tibble: 208 x 7
#> # Groups: brand, year [4]
#> week col1 col2 brand year roll_col1 roll_col2
#> <dbl> <dbl> <dbl> <chr> <int> <dbl> <dbl>
#> 1 1 5 10 brand a 2018 12.5 25
#> 2 2 10 20 brand a 2018 17.5 35
#> 3 3 15 30 brand a 2018 22.5 45
#> 4 4 20 40 brand a 2018 27.5 55
#> 5 5 25 50 brand a 2018 32.5 65
#> 6 6 30 60 brand a 2018 37.5 75
#> 7 7 35 70 brand a 2018 42.5 85
#> 8 8 40 80 brand a 2018 47.5 95
#> 9 9 45 90 brand a 2018 52.5 105
#> 10 10 50 100 brand a 2018 57.5 115
#> # ... with 198 more rows
由reprex package (v2.0.0) 于 2021-07-10 创建
【讨论】:
以上是关于R - 计算 R - 面板数据中多个变量、多年和多个品牌的 4 周滚动窗口的平均值的主要内容,如果未能解决你的问题,请参考以下文章
R语言 | 生存分析之R包survival的单变量和多变量Cox回归