使用 dplyr 和 RcppRoll 计算所有固定窗口平均值
Posted
技术标签:
【中文标题】使用 dplyr 和 RcppRoll 计算所有固定窗口平均值【英文标题】:Compute all fixed window averages with dplyr and RcppRoll 【发布时间】:2022-01-10 09:56:28 【问题描述】:我想使用 dplyr 和 RcppRoll 计算所有(或至少许多)固定窗口平均值。例如,如果我想根据storms
数据计算前 4、5 和 6 个时间步的平均风速,我可以使用以下方法:
library(dplyr)
library(RcppRoll)
set.seed(1)
storms <- storms[storms$name %in% sample(storms$name, size = 4),]
storms %>%
select(name, year, month, day, hour, wind) %>%
group_by(name) %>%
arrange(name, year, month, day, hour) %>%
mutate_at("wind", .funs = funs(
"avg_4" = roll_meanr(., n = 4, fill = NA),
"avg_5" = roll_meanr(., n = 5, fill = NA),
"avg_6" = roll_meanr(., n = 6, fill = NA)
))
这可行,但是如果我想计算 2 到 20 窗口的所有固定窗口平均值,我会厌倦复制和粘贴 funs()
内的行。
似乎我应该能够以某种方式对其进行参数化,但我还没有弄清楚如何。
【问题讨论】:
【参考方案1】:只需使用引用和取消引用的力量! 这就是你所拥有的:
library(dplyr)
library(RcppRoll)
set.seed(1)
storms <- storms[storms$name %in% sample(storms$name, size = 4),]
storms_subset <- storms %>%
select(name, year, month, day, hour, wind) %>%
group_by(name) %>%
arrange(name, year, month, day, hour) %>%
mutate_at("wind", .funs = funs(
"avg_4" = roll_meanr(., n = 4, fill = NA),
"avg_5" = roll_meanr(., n = 5, fill = NA),
"avg_6" = roll_meanr(., n = 6, fill = NA)
))
现在让我们创建一个函数,为不同的x
s 和n
s 构建一组表达式,例如roll_meanr(x, n)
。
make_rollmeans <- function(..., .n = 3)
# this line captures vars you typed in
.dots <- rlang::exprs(...)
# now you iterate over captured variables...
q <- purrr::map(.dots, function(.var)
# ... and over window sizes
purrr::map(.n, function(.nn)
# for each (variable, window) pair make an expression
rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))
) %>%
# set proper names by combining variable name, "avg", and window size
purrr::set_names(paste0(as.character(.var), "_avg_", .n))
) %>%
# and finally remove inner structure of list of expressions
# after that you'll have a list of expressions with depth 1
purrr::flatten()
q
所有的魔法都来自rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))
。
使用!!.var
,您可以将.var
替换为输入变量名称,即wind
。
使用!!.nn
,您将.nn
替换为number。
接下来,用rlang::expr(...)
引用表达式。
此函数获取不带""
的变量名和窗口大小向量。输出如下所示:
make_rollmeans(wind, pressure, .n = c(3, 5))
#> $wind_avg_3
#> RcppRoll::roll_meanr(wind, 3)
#>
#> $wind_avg_5
#> RcppRoll::roll_meanr(wind, 5)
#>
#> $pressure_avg_3
#> RcppRoll::roll_meanr(pressure, 3)
#>
#> $pressure_avg_5
#> RcppRoll::roll_meanr(pressure, 5)
你可以看到你正在寻找的表达式。
接下来,您可以将 make_rollmeans
放入 mutate()
调用中,使用 !!!
(bang-bang-bang) 运算符取消引用由它构建的表达式。
select(storms_subset, wind) %>% mutate(!!!make_rollmeans(wind, .n = 3:20))
#> Adding missing grouping variables: `name`
#> # A tibble: 261 x 20
#> # Groups: name [4]
#> name wind wind_avg_3 wind_avg_4 wind_avg_5 wind_avg_6 wind_avg_7
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Ernesto 30 NA NA NA NA NA
#> 2 Ernesto 30 NA NA NA NA NA
#> 3 Ernesto 30 30.0 NA NA NA NA
#> 4 Ernesto 35 31.7 31.2 NA NA NA
#> 5 Ernesto 40 35.0 33.8 33. NA NA
#> 6 Ernesto 50 41.7 38.8 37. 35.8 NA
#> 7 Ernesto 60 50.0 46.2 43. 40.8 39.3
#> 8 Ernesto 55 55.0 51.2 48. 45.0 42.9
#> 9 Ernesto 50 55.0 53.8 51. 48.3 45.7
#> 10 Ernesto 45 50.0 52.5 52. 50.0 47.9
#> # ... with 251 more rows, and 13 more variables: wind_avg_8 <dbl>,
#> # wind_avg_9 <dbl>, wind_avg_10 <dbl>, wind_avg_11 <dbl>,
#> # wind_avg_12 <dbl>, wind_avg_13 <dbl>, wind_avg_14 <dbl>,
#> # wind_avg_15 <dbl>, wind_avg_16 <dbl>, wind_avg_17 <dbl>,
#> # wind_avg_18 <dbl>, wind_avg_19 <dbl>, wind_avg_20 <dbl>
我希望结果与您所要求的相同。 :)
【讨论】:
【参考方案2】:使用 Base R,希望对您有所帮助:
storms_wind <- storms %>%
select(name, year, month, day, hour, wind) %>%
group_by(name) %>%
arrange(name, year, month, day, hour)
multi_avg <- function(df, start, end)
for(i in (strat:end))
varname <- paste("avg", i , sep="_")
df[[varname]] <- with(df, roll_meanr(wind, n = i, fill = NA))
df
multi_avg(df=storms_wind, start=4,end=20)
【讨论】:
以上是关于使用 dplyr 和 RcppRoll 计算所有固定窗口平均值的主要内容,如果未能解决你的问题,请参考以下文章