论文精读一石二鸟:Series Saliency for Accurate and Interpretable Multivariate Time Series Forecasting

Posted 程序媛小哨

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了论文精读一石二鸟:Series Saliency for Accurate and Interpretable Multivariate Time Series Forecasting相关的知识,希望对你有一定的参考价值。

Two Birds with One Stone: Series Saliency for Accurate and Interpretable Multivariate Time Series Forecasting

Abstract

It is important yet challenging to perform accurate and interpretable time series forecasting. Though deep learning methods can boost the forecasting accuracy, they often sacrifice interpretability. In this paper, we present a new scheme of series saliency to boost both accuracy and interpretability. By extracting series images from sliding windows of the time series, we design series saliency as a mixup strategy with a learnable mask between the series images and their perturbed versions. Series saliency is model agnostic and performs as an adaptive data augmentation method for training deep models. Moreover, by slightly changing the objective, we optimize series saliency to find a mask for interpretable forecasting in both feature and time dimensions. Experimental results on several real datasets demonstrate that series saliency is effective to produce accurate time-series forecasting results as well as generate temporal interpretations

准确、可解释的时间序列预测是一项重要而又具有挑战性的工作。尽管深度学习方法可以提高预测的准确性,但它们往往牺牲了可解释性。在本文中,我们提出了一个新的序列显著性(series saliency)方案,以提高准确性和可解释性。通过从时间序列的滑动窗口中提取序列图像,我们将序列显著性(series saliency)设计为一种混合策略,在序列图像与其扰动版本之间使用可学习掩码。序列显著性(series saliency)是模型不可知的,可以作为一种自适应的数据增强方法来训练深度模型。此外,通过稍微改变目标,我们优化了序列显著性(series saliency),以便在特征和时间维度上找到可解释预测的掩码。在多个真实数据集上的实验结果表明,序列显著性(series saliency)能够有效地产生准确的时间序列预测结果,并能产生时间解释(temporal interpretations)。

1 Introduction

Time series forecasting is an important task with wide applications. Traditional parametric models often have a shallow architecture (e.g., [Box and Jenkins, 1976; Harvey, 1990]). By adopting some explicit assumptions, such methods are easy-to-interpret, but their predictive capabilities are often limited. Deep architectures have become increasingly popular for time-series [Gamboa, 2017], including recurrent neural networks (RNN), nonlinear autoregressive exogenous neural network (NARX) [Chen et al, 1990], long-short term memory (LSTM) [Hochreiter and Schmidhuber, 1997], gated recurrent unit (GRU) [Chung et al, 2014] and neural attention methods. Though effective in improving forecasting accuracy, deep models are hard to interpret the outputs [Castelvecchi, 2016], which may hinder their applications to high stakes applications (e.g., healthcare) where reliable interpretation is crucial. Though much progress has been made on interpreting deep visual or language models [Samek et al, 2019], it is relatively unexplored to develop both accurate and interpretable methods for multivariate time series forecasting, where the 2D time-feature format imposes new challenges.

时间序列预测是一个应用广泛的重要课题。传统的参数化模型通常具有较浅的架构(例如,[Box and Jenkins, 1976;哈维,1990])。通过采用一些明确的假设,这些方法易于解释,但其预测能力往往有限。深度架构在时间序列中越来越受欢迎[Gamboa, 2017],包括递归神经网络(RNN)、非线性自回归外生神经网络(NARX) [Chen等人,1990]、长短期记忆(LSTM) [Hochreiter和Schmidhuber, 1997]、门控递归单元(GRU) [Chung等人,2014]和神经注意力方法。尽管深度模型在提高预测精度方面有效,但很难解释输出[Castelvecchi, 2016],这可能会阻碍其应用于高风险应用(例如,医疗保健),在这些应用中,可靠的解释至关重要。尽管在解释深度视觉或语言模型方面已经取得了很大进展[Samek等人,2019],但在为多元时间序列预测开发准确和可解释的方法方面还相对尚未探索,其中2D时间特征格式提出了新的挑战。

Existing work often considers either the time or feature domain, or treats them separately via a two-stage method.
For example, some attempts have been made to apply the interpretation methods for general neural networks, such as LIME [Ribeiro et al, 2016], DeepLift [Shrikumar et al, 2017] and Shap [Lundberg and Lee, 2017]. They use gradient information to extract feature information for singletime forecasts after the back-propagation training, thereby ignoring the crucial temporal information and insufficient for forecasting interpretation [Mitrea et al, 2009].
Another type of solutions transfers the attention methods from the fields of language or vision [Bahdanau et al, 2014; V aswani et al, 2017; Assaf et al, 2019; Shih et al, 2019].However, the attention values for explaining RNNs or CNNs are calculated via the relative importance of the different time steps and there are concerns that they are based on the intermediate feature importance instead of model interpretations [Serrano and Smith, 2019]. More recently, Ismail et al [Ismail et al, 2020] develop a two-stage saliency approach that decouples the time dimension and feature dimension, thereby may lead to sub-optimal solutions.

现有的工作通常考虑时间或特征域,或通过两阶段方法分别处理它们。
例如,一些尝试将解释方法应用于一般神经网络,如LIME [Ribeiro et al, 2016], DeepLift [Shrikumar et al, 2017]和Shap [Lundberg and Lee, 2017]。他们在反向传播训练后使用梯度信息提取单次预测的特征信息,从而忽略了关键的时间信息不足以用于预测解释[Mitrea et al, 2009]。
另一种解决方案是从语言或视觉领域转移注意力方法[Bahdanau等人,2014;V aswani等人 2017;Assaf等人,2019;Shih等,2019]。然而,用于解释rnn或cnn的注意值是通过不同时间步长的相对重要性来计算的,有人担心它们是基于中间特征重要性而不是模型解释[Serrano和Smith, 2019]。最近,Ismail等人[Ismail等,2020]开发了一种两阶段显著性方法将时间维度和特征维度解耦,从而可能导致次优解决方案。

In this work, we present a new strategy of series saliency to boost both forecasting accuracy and interpretability of deep time series models, by considering the time and feature dimensions in a coherent manner. As shown in Fig. 1, we consider multivariate time series as a set of window × feature series images, and design series saliency as a masked mixup between series images and their perturbed versions, where the mask is a learnable matrix. Series saliency is model agnostic and can be used as an effective data augmentation method to boost the accuracy of deep forecasting models, where the augmentation strategy is learnable and adaptive, thereby different from the common augmentation methods (e.g., [Iwana and Uchida, 2020]) that typically apply some pre-fixed operations on a given training set. Furthermore, by simply changing the objective function, we can optimize the series saliency module to find a mask (i.e., heatmap) that identifies important regions for forecasting, thereby boosting interpretability.
We present both quantitative and qualitative results on several typical time series datasets, which show that our method achieves better (or comparable) forecasting results and meanwhile provides temporal interpretations for the forecasts.

在这项工作中,我们提出了一种新的序列显著性策略,通过以一致的方式考虑时间和特征维度,来提高深度时间序列模型的预测精度和可解释性。如图1所示,我们将多元时间序列视为一组窗口×特征序列图像,并将序列显著性设计为序列图像与其扰动版本之间的掩码混合,其中掩码为可学习矩阵。 序列显著性是模型不可知的,可以作为一种有效的数据增强方法来提高深度预测模型的准确性,其中增强策略是可学习和自适应的,因此不同于常见的增强方法(例如,[Iwana和内田,2020]),后者通常在给定的训练集上应用一些预先固定的操作。此外,通过简单地改变目标函数,我们可以优化序列显著性模块,以找到识别用于预测的重要区域的掩码(即热图),从而提高可解释性。

我们在几个典型的时间序列数据集上给出了定量和定性的结果,这表明我们的方法获得了更好的(或可比的)预测结果,同时为预测提供了时间解释。

Figure 1: (Left): multivariate time series and the corresponding temporal saliency map, where we extract series images from the multivariate time series, and each series image is associated with a temporal saliency map to identify the informative features for forecasting; (Right): the proposed series saliency module, where for each series image we perturb the original series to obtain the reference series image, and the series image and its perturbed version are mixed up by the series saliency map.

图1:(左):多元时间序列和相应的时间显著图,我们从多元时间序列中提取序列图像,每个序列图像与一个时间显著图相关联,以识别用于预测的信息特征;(右):提出的序列显著性模块,对每个序列图像扰动原序列得到参考序列图像,序列图像与其扰动后的版本通过序列显著性图进行混合。

2 Methods

We now present our method in detail, which consists of a series saliency module and its use in both training and interpretation phases.

我们现在详细介绍了我们的方法,它包括一系列显著性模块及其在训练和解释阶段的使用。

2.1 Setup and Notations

As shown in Fig. 1, multivariate time series data are spatiotemporal with two dimensions – time and feature. Formally, we use S to denote a series of observed time series signal:
S = [ s 1 , s 2 , ⋅ ⋅ ⋅ , s t , ⋅ ⋅ ⋅ ] , ( 1 ) S = [s1, s2, · · · , st, · · · ], (1) S=[s1,s2,⋅⋅⋅,st,⋅⋅⋅],(1)

where s t ∈ R D s_t∈R^D stRD denotes the feature vector with dimension D D D at time t t t. For every time step t t t, we aim to predict the future value s t + τ s_t+τ st+τafter a given horizon τ τ τ. The horizon is chosen according to the forecasting task settings. For example, for the traffic usage, the horizon τ of interest ranges from an hour to a day; while for the stock market data, even a second or minute ahead forecasting can be meaningful for generating returns. Besides forecasting accuracy, we are also interested in interpretation: which features contribute most and which contributes least for the final forecast predictions? As stated above, traditional statistical methods (e.g., [Box and Jenkins, 1976] and [Harvey, 1990]) are easy to interpret by making some explicit model assumptions that can extract interpretations directly from the learned model parameters, while they are often limited in forecasting accuracy.

In contrast, recent progress on deep learning methods leads to superior prediction capabilities [Goodfellow et al, 2016], however, they are hard to interpret since the deep model assumptions are stacked with multiple non-linear activation or blocks.
A key part of an effective model for multivariate time series forecasting would be the capability on handling the information from both time and feature dimensions in a coherent manner. Recent work develops an attention-basedscheme [Assaf et al, 2019; Shih et al, 2019], which introduces an attention map to “selectively” combine time-feature information (see Fig. 2 (left)), with the primary focus on interpretation. However, as pointed out in [Ismail et al, 2020], the attention-based methods can be insufficient for interpreting multivariate time-series data. We develop a new scheme of series saliency, which is model-agnostic and can boost both forecasting accuracy and interpretation, as detailed below.

如图1所示,多元时间序列数据具有时间和特征两个维度的时空特征。形式上,我们用S表示一系列观测到的时间序列信号:

S = [ s 1 , s 2 , ⋅ ⋅ ⋅ , s t , ⋅ ⋅ ⋅ ] , ( 1 ) S = [s_1, s_2, · · · , s_t, · · · ], (1) S=[s1,s2,⋅⋅⋅,st,⋅⋅⋅],(1)

其中 s t ∈ R D s_t∈R^D stRD表示时间t时维度为 D D D的特征向量。对于每个时间步 t t t,我们的目标是预测给定视界 τ τ τ后的未来值 s t + τ s_t+τ st+τ。根据预测任务设置选择视界。例如,对于流量使用情况,感兴趣的地平线τ范围从一小时到一天;而对于股市数据,即使是提前一秒钟或一分钟的预测,对产生回报也是有意义的。除了预测的准确性,我们还对解释感兴趣:哪些特征对最终的预测贡献最大,哪些贡献最小?如上所述,传统的统计方法(如[Box and Jenkins, 1976]和[Harvey, 1990])很容易通过做出一些明确的模型假设来解释,这些假设可以直接从学习到的模型参数中提取解释,但它们往往在预测精度方面受到限制。

相比之下,深度学习方法的最新进展导致了卓越的预测能力[Goodfellow等人,2016],然而,它们很难解释,因为深度模型假设堆叠了多个非线性激活或块。

有效的多元时间序列预测模型的一个关键部分是以连贯的方式处理来自时间和特征维度的信息的能力。最近的工作开发了一种基于注意力的计划[Assaf等人,2019;Shih等人,2019],他们引入了一种注意力图,以“选择性地”结合时间特征信息(见图2(左)),主要关注解释。然而,正如[Ismail等人,2020]所指出的那样,基于注意力的方法可能不足以解释多元时间序列数据。我们开发了一种新的序列显著性方案,它是模型不可知的,可以提高预测精度和解释,如下所述。

2.2 Series Saliency

We develop series saliency by drawing inspirations from the saliency maps [Dabkowski and Gal, 2017] in computer vision. However, unlike previous work on saliency maps that mainly focuses on interpetability of deep models, series saliency is beneficial for improving both forecasting accurarcy and interpretation for time-series data.

Specifically, to consider the time-feature information jointly, we first represent the multivariate time series as a set of 2D series images. As shown in Fig. 1, each series image corresponds to a part of the multivariate time series within a given time window. Formally, let T T T be the window size. We simply set the value of T T T for various datasets with 2 periodic patterns (e.g., p = 48 p = 48 p=48 for hourly electricity consumption). A series image is represented as a matrix X ∈ R D × T X∈R^D×T XRD×T , of which each row corresponds to one feature dimension in the multivariate time series. Then, we follow the perturbation strategy in the smallest destroying region (SDR) principle [Dabkowski and Gal, 2017] to design the series saliency scheme. We define a reference series image X X X by adding noise or Gaussian blur on each element of the original series image X X X:
x ^ t , i = x t , i + ϵ σ 1 , n o i s e g σ 2 ( x t , i ) , b l u r (2) \\hatx_t,i= \\begincases x_t,i+\\epsilon_\\sigma_1,\\quad noise\\\\ g_\\sigma_2(x_t,i), \\quad blur \\endcases \\tag2 x^t,i=xt,i+ϵσ1,noisegσ2(xt,i),blur(2)
where ϵ σ 1   ∼ N ( μ , σ 1 2 ) \\epsilon _\\sigma _1\\ \\sim \\mathcal N \\left( \\mu ,\\sigma _1^2 \\right) ϵσ1 N(μ,σ12)is a Gaussian noise and gσ2 is a Gaussian blur kernel on element xt,i with the maximum isotropic standard deviation σ2

我们通过从计算机视觉中的显著图[Dabkowski和Gal, 2017]中汲取灵感来开发系列显著性。然而,与以往对显著性图的研究主要关注深度模型的互操作性不同,序列显著性有利于提高时间序列数据的预测精度和解释。

具体来说,为了综合考虑时间特征信息,我们首先将多元时间序列表示为一组二维序列图像。如图1所示,在给定的时间窗口内,每个序列图像都对应于多元时间序列的一部分。形式上,设 T T T为窗口大小。我们简单地为具有2个周期模式的各种数据集设置 T T T的值(例如,每小时用电量的 p = 48 p = 48 p=48)。序列图像表示为矩阵 X ∈ R D × T X∈R^D×T XRD×T,其中每一行对应多元时间序列中的一个特征维。然后,我们遵循最小破坏区(SDR)原理的扰动策略[Dabkowski and Gal, 2017]设计了序列显著性方案。我们通过在原始系列图像 X X X的每个元素上添加噪声或高斯模糊来定义参考系列图像 X X X:

x ^ t , i = x t , i + ϵ σ 1 , n o i s e g σ 2 ( x t , i ) , b l u r (2) \\hatx_t,i= \\begincases x_t,i+\\epsilon_\\sigma_1,\\quad noise\\\\ g_\\sigma_2(x_t,i), \\quad blur \\endcases \\tag2 x^t,i=xt,i+ϵσ1,noisegσ2(xt,i),blur(2)

ϵ σ 1   ∼ N ( μ , σ 1 2 ) \\epsilon _\\sigma _1\\ \\sim \\mathcal N \\left( \\mu ,\\sigma _1^2 \\right) ϵσ论文精读Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

论文泛读95一石二鸟:窃取模型并从基于BERT的API推断属性

论文阅读:Review of Visual Saliency Detection with Comprehensive Information

翻译Itti的论文1998 A Model of Saliency-Based Visual Attention

显著性检测:'Saliency Detection via Graph-Based Manifold Ranking'论文总结

论文精读系列文章