论文精读Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

Posted 程序媛小哨

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了论文精读Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting相关的知识,希望对你有一定的参考价值。

【论文精读】Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

针对未来的一个多步预测任务,同时支持可解释性表现,模型主体是基于transformer的改进

Abstract

Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed in the past – without any prior information on how they interact with the target. Several deep learning methods have been proposed, but they are typically ‘black-box’ models that do not shed light on how they use the full range of inputs present in practical scenarios. In this paper, we introduce the Temporal Fusion Transformer (TFT) – a novel attention-based architecture that combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics. To learn temporal relationships at different scales, TFT uses recurrent layers for local processing and interpretable self-attention layers for long-term dependencies.
TFT utilizes specialized components to select relevant features and a series of gating layers to suppress unnecessary components, enabling high performance in a wide range of scenarios. On a variety of real-world datasets, we demonstrate significant performance improvements over existing benchmarks, and highlight three practical interpretability use cases of TFT.

多水平预测(多步预测,未来可能不止预测一个点,可能预测多个点)通常包含复杂的输入组合——包括静态(即时不变)协变量、已知的未来输入和其他仅在过去观察到的外生时间序列(数据源比较复杂)——没有任何关于它们如何与目标相互作用的先验信息。已经提出了几种深度学习方法,但它们都是典型的“黑盒”模型,无法说明它们如何在实际场景中使用各种输入。没法看到不同数据源具体怎么作用到我这个预测的在本文中,我们介绍了时间融合转换器(TFT)——一种新颖的基于注意力的架构,它将高性能的多水平预测与对时间动态的可解释见解相结合(在多步预测上表现好且支持可解释性)。为了学习不同尺度上的时间关系,TFT使用循环层进行局部处理,使用可解释的自我注意层进行长期依赖。

TFT利用专门的组件来选择相关的特性,并使用一系列门控层来抑制不必要的组件,从而在广泛的场景中实现高性能。在各种真实世界的数据集上,我们展示了相对于现有基准的显著性能改进,并强调了TFT的三个实际可解释性用例。

1. Introduction

Multi-horizon forecasting, i.e. the prediction of variables-of-interest at multiple future time steps, is a crucial problem within time series machine learning. In contrast to one-step-ahead predictions, multi-horizon forecasts provide users with access to estimates across the entire path, allowing them to optimize their actions at multiple steps in the future (e.g. retailers optimizing the inventory for the entire upcoming season, or clinicians optimizing a treatment plan for a patient). Multi-horizon forecasting has many impactful real-world applications in retail (Böse et al, 2017; Courty & Li, 1999), healthcare (Lim, Alaa, & van der Schaar, 2018; Zhang & Nawata, 2018) and economics (Capistran, Constandse, & RamosFrancia, 2010) – performance improvements to existing methods in such applications are highly valuable.

多水平预测,即在未来多个时间步对感兴趣的变量进行预测,是时间序列机器学习中的一个关键问题。与一步前预测相比,多阶段预测为用户提供了整个路径的估计,允许他们在未来的多个步骤中优化他们的行动(例如,零售商为即将到来的整个季节优化库存,或临床医生为患者优化治疗计划)。多视界预测在零售业中有许多有影响力的现实应用(Böse等人,2017;Courty & Li, 1999),医疗保健(Lim, Alaa, & van der Schaar, 2018;Zhang & Nawata, 2018)和经济学(Capistran, Constandse, & RamosFrancia, 2010) -在此类应用中对现有方法的性能改进非常有价值。

Practical multi-horizon forecasting applications commonly have access to a variety of data sources, as shown in Fig. 1, including known information about the future (e.g. upcoming holiday dates), other exogenous time series (e.g. historical customer foot traffic), and static metadata (e.g. location of the store) – without any prior knowledge on how they interact. This heterogeneity of data sources together with little information about their interactions makes multi-horizon time series forecasting particularly challenging.

实际的多视界预测应用程序通常可以访问各种数据源,如图1所示,包括关于未来的已知信息(例如即将到来的假期日期)、其他外生时间序列(例如历史客户客流量)和静态元数据(例如商店的位置),而不需要任何关于它们如何交互的先验知识。数据源的异质性以及关于它们相互作用的很少信息使得多视域时间序列预测特别具有挑战性。

Deep neural networks (DNNs) have increasingly been used in multi-horizon forecasting, demonstrating strong performance improvements over traditional time series models (Alaa & van der Schaar, 2019; Makridakis, Spiliotis, & Assimakopoulos, 2020; Rangapuram et al, 2018). While many architectures have focused on variants of recurrent neural network (RNN) architectures (Rangapuram et al, 2018; Salinas, Flunkert, Gasthaus, & Januschowski, 2019; Wen et al, 2017), recent improvements have also used attention-based methods to enhance the selection of relevant time steps in the past (Fan et al, 2019) – including transformer-based models (Li et al, 2019). However, these often fail to consider the different types of inputs commonly present in multi-horizon forecasting, and either assume that all exogenous inputs are known into the future (Li et al, 2019; Rangapuram et al, 2018; Salinas et al, 2019) – a common problem with autoregressive models – or neglect important static covariates (Wen et al, 2017) – which are simply concatenated with other time-dependent features at each step. Many recent improvements in time series models have resulted from the alignment of architectures with unique data characteristics (Koutník, Greff, Gomez, & Schmidhuber, 2014; Neil et al, 2016). We argue and demonstrate that similar performance gains can also be reaped by designing networks with suitable inductive biases for multi-horizon forecasting.

In addition to not considering the heterogeneity of common multi-horizon forecasting inputs, most current architectures are ‘black-box’ models where forecasts are controlled by complex nonlinear interactions between many parameters. This makes it difficult to explain how models arrive at their predictions, and in turn, makes it challenging for users to trust a model’s outputs and model builders to debug it. Unfortunately, commonly used explainability methods for DNNs are not well suited for applying to time series. In their conventional form, post hoc methods (e.g. LIME (Ribeiro et al, 2016) and SHAP (Lundberg & Lee, 2017)) do not consider the time ordering of input features. For example, for LIME, surrogate models are independently constructed for each data point, and for SHAP, features are considered independently for neighboring time steps. Such post hoc approaches would lead to poor explanation quality as dependencies between timesteps are typically significant in time series. On the other hand, some attention-based architectures are proposed with inherent interpretability for sequential data, primarily language or speech – such as the Transformer architecture (Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, & Polosukhin, 2017). The fundamental caveat to apply them is that multi-horizon forecasting includes many different types of input features, as opposed to language or speech. In their conventional form, these architectures can provide insights into relevant time steps for multi-horizon forecasting, but they cannot distinguish the importance of different features at a given timestep.

深度神经网络(DNNs)已越来越多地用于多水平预测,与传统时间序列模型相比,表现出强大的性能改进(Alaa & van der Schaar, 2019;Makridakis, Spiliotis, & Assimakopoulos, 2020;Rangapuram等人,2018)。虽然许多架构都专注于循环神经网络(RNN)架构的变体(Rangapuram等人,2018;Salinas, Flunkert, Gasthaus, & Januschowski, 2019;Wen et al, 2017),最近的改进也使用了基于注意力的方法来增强过去相关时间步长的选择(Fan et al, 2019) -包括基于变压器的模型(Li et al, 2019)。

历史研究的瓶颈
①没有考虑常见的多水平预测输入的异质性
然而,这些通常没有考虑多水平预测中常见的不同类型的输入,或者假设未来所有外生输入都是已知的(Li等人,2019;Rangapuram等人,2018;Salinas等人,2019)——自回归模型的一个常见问题——或忽略了重要的静态协变量(Wen等人,2017)——这些协变量只是在每一步中与其他随时间变化的特征连接起来。
时间序列模型的许多最新改进都来自于具有独特数据特征的架构的对齐(Koutník, Greff, Gomez, & Schmidhuber, 2014;尼尔等人,2016)。我们认为并证明,通过设计具有适合多水平预测的归纳偏差的网络,也可以获得类似的性能收益。

②缺乏可解释性
除了没有考虑常见的多水平预测输入的异质性外,目前大多数架构都是“黑盒”模型,其中预测由许多参数之间复杂的非线性相互作用控制。这使得解释模型如何得到预测变得困难,反过来,也使得用户难以信任模型的输出和模型构建者对其进行调试。不幸的是,常用的dnn解释性方法并不适合应用于时间序列。在传统形式中,事后方法(例如LIME(Ribeiro et al, 2016)和SHAP (Lundberg & Lee, 2017))不考虑输入特征的时间顺序
例如,对于LIME,代理模型是为每个数据点独立构建的,而对于SHAP,特征是独立考虑相邻时间步长的。
这种事后的方法将导致较差的解释质量,因为时间步骤之间的依赖性在时间序列中通常是显著的。
另一方面,一些基于注意力的架构被提出,对顺序数据具有固有的可解释性,主要是语言或语音——例如Transformer架构(Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, & Polosukhin, 2017)。应用它们的基本警告是,多水平预测包括许多不同类型的输入特征,而不是语言或语音。在它们的传统形式中,这些架构可以为多水平预测提供相关时间步骤的见解,但它们不能在给定的时间步骤中区分不同特征的重要性只回答了那些时间点是重要的,很难回答这个时间点上哪个特征重要。

Overall, in addition to the need for new methods to tackle the heterogeneity of data in multi-horizon forecasting for high performance, new methods are also needed to render these forecasts interpretable, given the needs of the use cases.

总的来说,除了需要新的方法来处理高性能多层面预测中的数据异质性外,还需要新的方法来使这些预测具有可解释性,以满足用例的需求。

DNN没有做的特别好的地方在于:

  1. 没有处理好多数据源的利用
    在实际业务场景中有很多数据源:不同的时序任务可以按照变量来分(单变量、多变量)
    单变量(uni-var):自己预测目标本身的一个历史数据
    多变量预测(muti-var):数据源就比较丰富(预测数据本身类别[商品类别],过去观测变量,未来已知的变量[节假日])
    在之前的模型没有很好的针对这三种不同数据做一些模型架构设计,大部分的模型只是单纯的将数据做embedding,然后合并在一起直接输入模型进行学习
  2. 解释模型的预测效果
    对于时序,需要告诉业务方,需要告知用的特征数据里面对产生决策映像比较大的数据,所以可解释性对于业务驱动来说很重要

In this paper, we propose the Temporal Fusion Transformer (TFT) – an attention-based DNN architecture for multi-horizon forecasting that achieves high performance while enabling new forms of interpretability. To obtain significant performance improvements over state-of-theart benchmarks, we introduce multiple novel ideas to align the architecture with the full range of potential inputs and temporal relationships common to multi-horizon forecasting – specifically incorporating (1) static covariate encoders which encode context vectors for use in other parts of the network, (2) gating mechanisms throughout and sample-dependent variable selection to minimize the contributions of irrelevant inputs, (3) a sequence-tosequence layer to locally process known and observed inputs, and (4) a temporal self-attention decoder to learn any long-term dependencies present within the dataset.

The use of these specialized components also facilitates interpretability; in particular, we show that TFT enables three valuable interpretability use cases: helping users identify (i) globally-important variables for the prediction problem, (ii) persistent temporal patterns, and (iii) significant events. On a variety of real-world datasets, we demonstrate how TFT can be practically applied, as well as the insights and benefits it provides.

论文的主要贡献
在本文中,我们提出了时间融合转换器(TFT)——一种基于注意力的DNN架构,用于多水平预测,在实现新形式的可解释性的同时实现高性能。为了在最先进的基准测试中获得显著的性能改进,我们引入了多种新颖的想法,以使架构与多水平预测常见的所有潜在输入和时间关系保持一致——特别是结合
(1)静态协变量编码器,对上下文向量进行编码,以用于网络的其他部分,相当于把静态信息引入模型引导模型去学习
(2)整个门控机制样本因变量选择,以最大限度地减少不相关输入的贡献,特征选择部分创新
(3)序列对序列层 ,用于局部处理已知和观察到的输入;seq2seq编码,可以局部处理实变的数据
(4)时间自注意解码器,用于学习数据集中存在的任何长期依赖关系。可以学习的时间更长

这些专用组件的使用也促进了可解释性;特别地,我们展示了TFT支持三个有价值的可解释性用例:帮助用户识别
(i)预测问题的全局重要变量
(ii)持久的时间模式,
(iii)重要事件。反馈重要时间点
在各种现实世界的数据集上,我们演示了TFT如何实际应用,以及它提供的见解和好处。

2. Related work

DNNs for Multi-horizon Forecasting: Similarly to traditional multi-horizon forecasting methods (Marcellino, Stock, & Watson, 2006; Taieb, Sorjamaa, & Bontempi, 2010), recent deep learning methods can be categorized into iterated approaches using autoregressive models (Li et al, 2019; Rangapuram et al, 2018; Salinas et al, 2019) or direct methods based on sequence-to-sequence models (Fan et al, 2019; Wen et al, 2017).

Iterated approaches utilize one-step-ahead prediction models, with multi-step predictions obtained by recursively feeding predictions into future inputs. Approaches with Long Short-term Memory (LSTM) (Hochreiter & Schmidhuber, 1997) networks have been considered, such as Deep AR (Salinas et al, 2019) which uses stacked LSTM layers to generate parameters of one-step-ahead Gaussian predictive distributions.Deep State-Space Models (DSSM) (Rangapuram et al, 2018) adopt a similarapproach, utilizing LSTMs to generate parameters of a predefined linear state-space model with predictive distributions produced via Kalman filtering – with extensions for multivariate time series data in Wang et al(2019). More recently, Transformer-based architectures have been explored in Li et al (2019), which proposes the use of convolutional layers for local processing and a sparse attention mechanism to increase the size of the receptive field during forecasting. Despite their simplicity, iterative methods rely on the assumption that the values of all variables excluding the target are known at forecast time – such that only the target needs to be recursively fed into future inputs. However, in many practical scenarios, numerous useful time-varying inputs exist, with many unknown in advance. Their straightforward use is hence limited for iterative approaches. TFT, on the other hand, explicitly accounts for the diversity of inputs – naturally handling static covariates and (past-observed and future-known) time-varying inputs.

In contrast, direct methods are trained to explicitly generate forecasts for multiple predefined horizons at each time step. Their architectures typically rely on sequence-to-sequence models, e.g. LSTM encoders to summarize past inputs, and a variety of methods to generate future predictions. The Multi-horizon Quantile Recurrent Forecaster (MQRNN) (Wen et al, 2017) uses LSTM or convolutional encoders to generate context vectors which are fed into multi-layer perceptrons (MLPs) for each horizon.

In Fan et al (2019) a multi-modal attention mechanism is used with LSTM encoders to construct context vectors for a bi-directional LSTM decoder. Despite performing better than LSTM-based iterative methods, interpretability remains challenging for such standard direct methods. In contrast, we show that by interpreting attention patterns, TFT can provide insightful explanations about temporal dynamics, and do so while maintaining state-of-the-art performance on a variety of datasets.

用于多层预测的DNNs: 类似于传统的多层面预测方法(Marcellino, Stock, & Watson, 2006;Taieb, Sorjamaa, & Bontempi, 2010),最近的深度学习方法可以被归类为使用自回归模型的迭代方法(Li等人,2019;Rangapuram等人,2018;Salinas等人,2019)或基于序列到序列模型的直接方法(Fan等人,2019;Wen等人,2017)。

迭代方法利用一步前的预测模型,通过递归地将预测输入到未来的输入中获得多步预测。已经考虑了长短期记忆(LSTM) (Hochreiter & Schmidhuber, 1997)网络的方法,例如Deep AR (Salinas等人,2019),它使用堆叠的LSTM层来生成领先一步的高斯预测分布的参数。深状态空间模型(DSSM) (Rangapuram等人,2018)采用类似的方法,利用lstm生成预定义线性状态空间模型的参数,该模型具有通过卡尔曼滤波产生的预测分布- Wang等人(2019)中对多元时间序列数据的扩展。最近,Li等人(2019)探索了基于transformer的架构,提出了使用卷积层进行局部处理和稀疏注意机制来增加预测期间接受域的大小。尽管迭代方法很简单,但它依赖于这样一个假设:在预测时间,除目标外的所有变量的值都是已知的——这样,只有目标需要递归地输入到未来的输入中。然而,在许多实际场景中,存在大量有用的时变输入,其中许多是预先未知的。因此,它们的直接使用对于迭代方法是有限的。另一方面,TFT明确地解释了输入的多样性——自然地处理静态协变量和(过去观察到的和未来已知的)时变输入。

相比之下,直接方法被训练为在每个时间步骤显式地生成多个预定义范围的预测。它们的体系结构通常依赖于序列到序列的模型,例如LSTM编码器来总结过去的输入,以及各种方法来生成未来的预测。多视界分位数循环预测器(MQRNN) (Wen等人,2017)使用LSTM或卷积编码器来生成上下文向量,这些向量被馈送到每个视界的多层感知器(mlp)。

Fan等人(2019)在LSTM编码器中使用了多模态注意机制来构造双向LSTM解码器的上下文向量。尽管表现比基于lstm的迭代方法更好,但这种标准直接方法的可解释性仍然具有挑战性。相比之下,我们表明,通过解释注意力模式,TFT可以提供关于时间动态的深刻解释,并在保持各种数据集上的最先进性能的同时做到这一点。

Time Series Interpretability with Attention: Attention mechanisms are used in translation (Vaswani et al, 2017), image classification (Wang, Jiang, Qian, Yang, Li, Zhang, Wang, & Tang, 2017) or tabular learning (Arik & Pfister, 2019) to identify salient portions of input for each instance using the magnitude of attention weights.
Recently, they have been adapted for time series with interpretability motivations (Alaa & van der Schaar, 2019; Choi et al, 2016; Li et al, 2019), using LSTM-based (Song et al, 2018) and transformer-based (Li et al, 2019) architectures. However, this was done without considering the importance of static covariates (as the above methods blend variables at each input). TFT alleviates this by using separate encoder–decoder attention for static features at each time step on top of the self-attention to determine the contribution time-varying inputs.

时间序列注意可解释性: 注意机制用于翻译(Vaswani等人,2017)、图像分类(Wang, Jiang, Qian, Yang, Li, Zhang, Wang, & Tang, 2017)或表格学习(Arik & Pfister, 2019),以使用注意权重的量级识别每个实例的输入显著部分。

最近,它们被改编为具有可解释性动机的时间序列(Alaa & van der Schaar, 2019;Choi等人,2016;Li等人,2019),使用基于lstm (Song等人,2018)和基于变压器(Li等人,2019)的架构。然而,这样做没有考虑静态协变量的重要性(因为上述方法在每个输入中混合变量)。TFT通过在自注意的基础上在每个时间步对静态特征使用单独的编码器-解码器注意来确定贡献时变输入来缓解这一问题。

Instance-wise Variable Importance with DNNs: Instance (i.e. sample)-wise variable importance can be obtained with post-hoc explanation methods (Lundberg & Lee, 2017; Ribeiro et al, 2016; Yoon, Arik, & Pfister, 2019) and inherently interpretable models (Choi et al, 2016; Guo, Lin, & Antulov-Fantulin, 2019). Post-hoc explanation methods, e.g. LIME (Ribeiro et al, 2016), SHAP (Lundberg & Lee, 2017) and RL-LIM (Yoon et al, 2019), are applied on pre-trained black-box models and often based on distilling into a surrogate interpretable model, or decomposing into feature attributions. They are not designed to take into account the time ordering of inputs, limiting their use for complex time series data. Inherently interpretable modeling approaches build components for feature selection directly into the architecture. For time series forecasting specifically, they are based on explicitly quantifying time-dependent variable contributions. For example, Interpretable Multi-Variable LSTMs (Guo et al, 2019) partitions the hidden state such that each variable contributes uniquely to its own memory segment, and weights memory segments to determine variable contributions. Methods combining temporal importance and variable selection have also been considered in Choi et al (2016), which computes a single contribution coefficient based on attention weights from each. However, in addition to the shortcoming of modeling only one-stepahead forecasts, existing methods also focus on instancespecific (i.e. sample-specific) interpretations of attention weights – without providing insights into global temporal dynamics. In contrast, the use cases in Section 7 demonstrate that TFT is able to analyze global temporal relationships and allows users to interpret global behaviors of the model on the whole dataset – specifically in the identification of any persistent patterns (e.g. seasonality or lag effects) and regimes present.

DNNs的实例变量重要性: 实例(即样本)变量重要性可以通过事后解释方法获得(Lundberg & Lee, 2017;Ribeiro等人,2016;Yoon, Arik, & Pfister, 2019)和固有可解释模型(Choi等人,2016;Guo, Lin, & Antulov-Fantulin, 2019)。事后解释方法,如LIME (Ribeiro等人,2016),SHAP (Lundberg & Lee, 2017)和RL-LIM (Yoon等人,2019),应用于预训练的黑盒模型,通常基于提取到替代可解释模型,或分解为特征属性。它们的设计没有考虑到输入的时间顺序,限制了它们对复杂时间序列数据的使用。固有的可解释的建模方法将组件直接构建到体系结构中进行特性选择。特别是对于时间序列预测,它们是基于显式量化的时间相关变量贡献。例如,可解释的多变量LSTMs (Guo等人,2019)划分隐藏状态,使每个变量对其自己的内存段有唯一的贡献,并对内存段进行加权以确定变量贡献。Choi等人(2016)也考虑了结合时间重要性和变量选择的方法,该方法基于每个变量的注意力权重计算单个贡献系数。然而,除了建模只能进行一步预测的缺点之外,现有的方法还专注于对注意力权重的实例特定(即样本特定)解释,而没有提供对全球时间动态的洞察。相比之下,第7节中的用例表明TFT能够分析全局时间关系,并允许用户在整个数据集上解释模型的全局行为——特别是在识别任何持久模式(例如季节性或滞后效应)和存在的制度方面。

3. Multi-horizon forecasting

Let there be I unique entities in a given time series dataset – such as different stores in retail or patients in healthcare. Each entity i is associated with a set of static covariates si ∈ Rms, as well as inputs χi,t ∈ Rmχ and scalar targets yi,t ∈ R at each time-step t ∈ [0, Ti].
Time-dependent input features are subdivided into two categories χi,t = [zT i,t, xT i,t ]T – observed inputs zi,t ∈ R(mz ) which can only be measured at each step and are unknown beforehand, and known inputs xi,t ∈ Rmx which can be predetermined (e.g. the day-of-week at time t).

In many scenarios, the provision for prediction intervals can be useful for optimizing decisions and risk management by yielding an indication of likely best and worst-case values that the target can take. As such, we adopt quantile regression to our multi-horizon forecasting setting (e.g. outputting the 10th, 50th and 90th percentiles at each time step). Each quantile forecast takes the form: ˆyi(q, t, τ ) = fq (τ , yi,t−k:t, zi,t−k:t, xi,t−k:t+τ , si ) , (1) where ˆyi,t+τ (q, t, τ ) is the predicted qth sample quantile of the τ -step-ahead forecast at time t, and fq(.) is a prediction model. In line with other direct methods, we simultaneously output forecasts for τmax time steps – i.e. τ ∈ 1, . . . , τmax. We incorporate all past information within a finite look-back window k, using target and known inputs only up till and including the forecast start time t (i.e. yi,t−k:t = yi,t−k, . . . , yi,t ) and known inputs across the entire range (i.e. xi,t−k:t+τ = xi,t−k, . . ., xi,t, . . . , xi,t+τ ).2

假设在给定的时间序列数据集中有 I I I个唯一的实体——例如零售中的不同商店或医疗保健中的患者。每个实体 i i i在每个时间步 t ∈ [ 0 , T i ] t∈[0,T_i] t[0,Ti]与一组静态协变量 s i ∈ R m s s_i\\in \\mathbbR^m_s siRms,以及输入 χ i , t ∈ R m χ \\chi _i,t\\in \\mathbbR^m_\\chi χi,tRmχ和标量目标 y i , t ∈ R y_i,t\\in \\mathbbR yi,tR相关联。

与时间相关的输入特征被细分为两类: χ i , t ∈ [ z i , t ⊤ , x i , t ⊤ ] ⊤ \\chi _i,t\\in \\left[ z_i,t^\\top,x_i,t^\\top \\right]^\\top χi,t[zi,t,xi,t]——观测到的输入 z i , t ∈ R ( m z ) z_i,t\\in \\mathbbR^(m_z) zi,tR(mz),它只能在每一步测量,并且是事先未知的;已知的输入 x i , t ∈ R m x x_i,t\\in \\mathbbR^m_x xi,tRmx,它可以预先确定(例如时间 t t t的星期几)。

在许多情况下,提供预测间隔对于优化决策和风险管理是有用的,因为它提供了目标可能获得的最佳值和最差值的指示。因此,我们在多水平预测设置中采用分位数回归(例如,在每个时间步骤中输出第10、50和90个百分位数)。每个分位数预测采用如下形式:
y ^ i ( q , t , τ ) = f q ( τ , y i , t − k : t , z i , t − k : t , x i , t − k : t + τ , s i ) , (1) \\haty_i\\left( q,t,\\tau \\right) =f_q\\left( \\tau ,y_i,t-k:t,z_i,t-k:t,x_i,t-k:t+\\tau,s_i \\right), \\tag1 y^i(q,t,τ)=fq(τ,yi,tk:t,zi,tk:t,xi,tk:t+τ,si),(1)

y ^ i ( q , t , τ ) \\haty_i\\left( q,t,\\tau \\right) y^i(q,t,τ) target
q q q quantile分位数
t现在所在的预测时间点
f q ( . ) f_q\\left(.\\right) fq(.)model
τ τ τ当前时间点预测未来时间点需要的步数
y i , t −

以上是关于论文精读Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting的主要内容,如果未能解决你的问题,请参考以下文章

Temporal Fusion Transformersfor Interpretable Multi-horizon Time Series Forecasting代码解读(tensoreflow)

Temporal Fusion Transformer (TFT) 各模块功能和代码解析(pytorch)

使用 Temporal Fusion Transformer 进行时间序列预测

论文精读系列文章

视频去模糊论文阅读-Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

可视化论文精读系列:SizePairs