在 R 中使用 Fable 进行时间序列预测;确定混合模型的最佳模型组合
Posted
技术标签:
【中文标题】在 R 中使用 Fable 进行时间序列预测;确定混合模型的最佳模型组合【英文标题】:Time series forecasting using Fable in R; determining most optimum combination of models for mixed model 【发布时间】:2022-01-07 23:26:23 【问题描述】:我正在使用 fable
和 fabletools
包进行一些时间序列预测分析,我有兴趣比较单个模型和混合模型(由我正在使用的单个模型组成)的准确性。
这是一些带有模拟数据框的示例代码:-
library(fable)
library(fabletools)
library(distributional)
library(tidyverse)
library(imputeTS)
#creating mock dataframe
set.seed(1)
Date<-seq(as.Date("2018-01-01"), as.Date("2021-03-19"), by = "1 day")
Count<-rnorm(length(Date),mean = 2086, sd= 728)
Count<-round(Count)
df<-data.frame(Date,Count)
df
#===================redoing with new model================
df$Count<-abs(df$Count)#in case there is any negative values, force them to be absolute
count_data<-as_tsibble(df)
count_data<-imputeTS::na.mean(count_data)
testfrac<-count_data%>%arrange(Date)%>%sample_frac(0.8)
lastdate<-last(testfrac$Date)
#train data
train <- count_data %>%
#sample_frac(0.8)
filter(Date<=as.Date(lastdate))
set.seed(1)
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count),
croston= CROSTON(Count),
ave=MEAN(Count),
naive=NAIVE(Count),
neural=NNETAR(Count),
lm=TSLM(Count ~ trend()+season())
) %>%
mutate(mixed = (ets + arima + snaive + croston + ave + naive + neural + lm) /8)# creates a combined model using the averages of all individual models
fc <- fit %>% forecast(h = 7)
accuracy(fc,count_data)
fc_accuracy <- accuracy(fc, count_data,
measures = list(
point_accuracy_measures,
interval_accuracy_measures,
distribution_accuracy_measures
)
)
fc_accuracy
# A tibble: 9 x 13
# .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1 winkler percentile CRPS
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 arima Test -191. 983. 744. -38.1 51.8 0.939 0.967 -0.308 5769. 567. 561.
#2 ave Test -191. 983. 744. -38.1 51.8 0.939 0.967 -0.308 5765. 566. 561.
#3 croston Test -191. 983. 745. -38.2 51.9 0.940 0.968 -0.308 29788. 745. 745.
#4 ets Test -189. 983. 743. -38.0 51.7 0.938 0.967 -0.308 5759. 566. 560.
#5 lm Test -154. 1017. 742. -36.5 51.1 0.937 1.00 -0.307 6417. 583. 577.
#6 mixed Test -173. 997. 747. -36.8 51.1 0.944 0.981 -0.328 29897. 747. 747.
#7 naive Test 99.9 970. 612. -19.0 38.7 0.772 0.954 -0.308 7856. 692. 685.
#8 neural Test -322. 1139. 934. -49.6 66.3 1.18 1.12 -0.404 26361. 852. 848.
#9 snaive Test -244 1192. 896. -37.1 55.5 1.13 1.17 -0.244 4663. 690. 683.
我演示了如何创建混合模型。但是,可能有一些单独的模型在添加到混合模型时会妨碍其性能;换句话说,如果混合模型不包括以有害方式扭曲准确性的单个模型,则混合模型可能会得到改进。
期望的结果
我想要实现的是能够测试单个模型的所有可能组合,并返回在其中一个准确度指标上具有最佳性能的混合模型,例如平均绝对误差 (MAE)。但我不确定如何以自动化方式执行此操作,因为有很多潜在的组合。
有人可以建议或分享一些关于我如何做到这一点的代码吗?
【问题讨论】:
【参考方案1】:需要考虑的几点:
虽然快速评估许多组合模型的性能肯定是可取的,但它非常不切实际。最好的选择是单独评估您的模型,然后使用例如创建更简单的组合最好的 2 或 3 个 例如,考虑您实际上可以有 加权 组合 - 例如0.75 * ets + 0.25 * arima
。现在的可能性实际上是无穷无尽的,因此您开始看到蛮力方法的局限性(注意,我不认为 fable
实际上支持这种组合)。
也就是说,这是一种可以用来生成所有可能组合的方法。请注意,这可能需要非常长的时间才能运行 - 但应该会给你你所追求的。
# Get a table of models to get combinations from
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count),
croston= CROSTON(Count),
ave=MEAN(Count),
naive=NAIVE(Count),
neural=NNETAR(Count),
lm=TSLM(Count ~ trend()+season())
)
# Start with a vector containing all the models we want to combine
models <- c("ets", "arima", "snaive", "croston", "ave", "naive", "neural", "lm")
# Generate a table of combinations - if a value is 1, that indicates that
# the model should be included in the combinations
combinations <- models %>%
purrr::set_names() %>%
map(~0:1) %>%
tidyr::crossing(!!!.)
combinations
#> # A tibble: 256 x 8
#> ets arima snaive croston ave naive neural lm
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0 0 1
#> 3 0 0 0 0 0 0 1 0
#> 4 0 0 0 0 0 0 1 1
#> 5 0 0 0 0 0 1 0 0
#> 6 0 0 0 0 0 1 0 1
#> 7 0 0 0 0 0 1 1 0
#> 8 0 0 0 0 0 1 1 1
#> 9 0 0 0 0 1 0 0 0
#> 10 0 0 0 0 1 0 0 1
#> # ... with 246 more rows
# This just filters for combinations with at least 2 models
relevant_combinations <- combinations %>%
filter(rowSums(across()) > 1)
# We can use this table to generate the code we would put in a call to `mutate()`
# to generate the combination. fable does something funny with code
# evaluation here, meaning that more elegant approaches are more trouble
# than they're worth
specs <- relevant_combinations %>%
mutate(id = row_number()) %>%
pivot_longer(-id, names_to = "model", values_to = "flag_present") %>%
filter(flag_present == 1) %>%
group_by(id) %>%
summarise(
desc = glue::glue_collapse(model, "_"),
model = glue::glue(
"(model_sums) / n_models",
model_sums = glue::glue_collapse(model, " + "),
n_models = n()
)
) %>%
select(-id) %>%
pivot_wider(names_from = desc, values_from = model)
# This is what the `specs` table looks like:
specs
#> # A tibble: 1 x 247
#> neural_lm naive_lm naive_neural naive_neural_lm ave_lm ave_neural
#> <glue> <glue> <glue> <glue> <glue> <glue>
#> 1 (neural + lm) / 2 (naive +~ (naive + neu~ (naive + neural ~ (ave +~ (ave + ne~
#> # ... with 241 more variables: ave_neural_lm <glue>, ave_naive <glue>,
#> # ave_naive_lm <glue>, ave_naive_neural <glue>, ave_naive_neural_lm <glue>,
#> # croston_lm <glue>, croston_neural <glue>, croston_neural_lm <glue>,
#> # croston_naive <glue>, croston_naive_lm <glue>, croston_naive_neural <glue>,
#> # croston_naive_neural_lm <glue>, croston_ave <glue>, croston_ave_lm <glue>,
#> # croston_ave_neural <glue>, croston_ave_neural_lm <glue>,
#> # croston_ave_naive <glue>, croston_ave_naive_lm <glue>, ...
# We can combine our two tables and evaluate the generated code to produce
# combination models as follows:
combinations <- fit %>%
bind_cols(rename_with(specs, ~paste0("spec_", .))) %>%
mutate(across(starts_with("spec"), ~eval(parse(text = .))))
# Compute the accuracy for 2 random combinations to demonstrate:
combinations %>%
select(sample(seq_len(ncol(.)), 2)) %>%
forecast(h = 7) %>%
accuracy(count_data, measures = list(
point_accuracy_measures,
interval_accuracy_measures,
distribution_accuracy_measures
))
#> # A tibble: 2 x 13
#> .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1 winkler
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 spec_ets_arima~ Test -209. 1014. 771. -40.1 54.0 0.973 0.998 -0.327 30825.
#> 2 spec_ets_snaiv~ Test -145. 983. 726. -34.5 48.9 0.917 0.967 -0.316 29052.
#> # ... with 2 more variables: percentile <dbl>, CRPS <dbl>
【讨论】:
出色、非常详细的答案,正是我想要的。谢谢!以上是关于在 R 中使用 Fable 进行时间序列预测;确定混合模型的最佳模型组合的主要内容,如果未能解决你的问题,请参考以下文章
R语言实战应用精讲50篇(二十五)-时空数据统计模型:确定性预测模型
R语言实战应用精讲50篇(二十五)-时空数据统计模型:确定性预测模型