带有 LSTM 单元的 Keras RNN 用于基于多个输入时间序列预测多个输出时间序列

Posted 2023-02-16

技术标签:

【中文标题】带有 LSTM 单元的 Keras RNN 用于基于多个输入时间序列预测多个输出时间序列【英文标题】：Keras RNN with LSTM cells for predicting multiple output time series based on multiple intput time series 【发布时间】：2017-06-16 06:25:07 【问题描述】：

我想用 LSTM 单元对 RNN 建模，以便根据多个输入时间序列预测多个输出时间序列。具体来说，我有 4 个输出时间序列，y1[t]、y2[t]、y3[t]、y4[t]，每个都有长度 3,000 (t=0,...,2999)。我还有 3 个输入时间序列，x1[t]、x2[t]、x3[t]，每个时间序列的长度为 3,000 秒（t=0,...,2999）。目标是使用截至当前时间点的所有输入时间序列来预测 y1[t],.. y4[t]，即：

  y1[t] = f1(x1[k],x2[k],x3[k], k = 0,...,t)
  y2[t] = f2(x1[k],x2[k],x3[k], k = 0,...,t)
  y3[t] = f3(x1[k],x2[k],x3[k], k = 0,...,t)
  y4[t] = f3(x1[k],x2[k],x3[k], k = 0,...,t)

为了让模型具有长期记忆，我按照以下方法创建了一个有状态的 RNN 模型。 keras-stateful-lstme。我的案例和keras-stateful-lstme 的主要区别在于：

超过 1 个输出时间序列超过 1 个输入时间序列目标是预测连续时间序列

我的代码正在运行。然而，即使使用简单的数据，模型的预测结果也很糟糕。所以我想问你我是否有什么问题。

这是我的代码和一个玩具示例。

在玩具示例中，我们的输入时间序列是简单的 cosign 和 sign 波：

import numpy as np
def random_sample(len_timeseries=3000):
    Nchoice = 600
    x1 = np.cos(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    x2 = np.cos(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    x3 = np.sin(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    x4 = np.sin(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    y1 = np.random.random(len_timeseries)
    y2 = np.random.random(len_timeseries)
    y3 = np.random.random(len_timeseries)
    for t in range(3,len_timeseries):
        ## the output time series depend on input as follows: 
        y1[t] = x1[t-2] 
        y2[t] = x2[t-1]*x3[t-2]
        y3[t] = x4[t-3]
    y = np.array([y1,y2,y3]).T
    X = np.array([x1,x2,x3,x4]).T
    return y, X
def generate_data(Nsequence = 1000):
    X_train = []
    y_train = []
    for isequence in range(Nsequence):
        y, X = random_sample()
        X_train.append(X)
        y_train.append(y)
    return np.array(X_train),np.array(y_train)

请注意，时间点 t 的 y1 只是 x1 在 t - 2 的值。另请注意，时间点 t 的 y3 只是前两步中 x1 的值。

使用这些函数，我生成了 100 组时间序列 y1,y2,y3,x1,x2,x3,x4。其中一半用于训练数据，另一半用于测试数据。

Nsequence = 100
prop = 0.5
Ntrain = Nsequence*prop
X, y = generate_data(Nsequence)
X_train = X[:Ntrain,:,:]
X_test  = X[Ntrain:,:,:]
y_train = y[:Ntrain,:,:]
y_test  = y[Ntrain:,:,:]

X、y 都是 3 维的，每个都包含：

#X.shape = (N sequence, length of time series, N input features)
#y.shape = (N sequence, length of time series, N targets)
print X.shape, y.shape
> (100, 3000, 4) (100, 3000, 3)

时间序列y1,..y4和x1,..,x3的例子如下：

我将这些数据标准化为：

def standardize(X_train,stat=None):
    ## X_train is 3 dimentional e.g. (Nsample,len_timeseries, Nfeature)
    ## standardization is done with respect to the 3rd dimention
    if stat is None:
        featmean = np.array([np.nanmean(X_train[:,:,itrain]) for itrain in range(X_train.shape[2])]).reshape(1,1,X_train.shape[2])
        featstd = np.array([np.nanstd(X_train[:,:,itrain]) for itrain in range(X_train.shape[2])]).reshape(1,1,X_train.shape[2])
        stat = "featmean":featmean,"featstd":featstd
    else:
        featmean = stat["featmean"]
        featstd = stat["featstd"]
    X_train_s = (X_train - featmean)/featstd
    return X_train_s, stat 

X_train_s, X_stat = standardize(X_train,stat=None)
X_test_s, _ = standardize(X_test,stat=X_stat)
y_train_s, y_stat = standardize(y_train,stat=None)
y_test_s, _ = standardize(y_test,stat=y_stat)

创建一个有 10 个 LSTM 隐藏神经元的有状态 RNN 模型

from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
def create_stateful_model(hidden_neurons):
    # create and fit the LSTM network

    model = Sequential()
    model.add(LSTM(hidden_neurons, 
                   batch_input_shape=(1, 1, X_train.shape[2]), 
                   return_sequences=False, 
                   stateful=True))
    model.add(Dropout(0.5))
    model.add(Dense(y_train.shape[2]))
    model.add(Activation("linear"))
    model.compile(loss='mean_squared_error', optimizer="rmsprop",metrics=['mean_squared_error'])
    return model
 model = create_stateful_model(10)

现在使用以下代码来训练和验证 RNN 模型：

def get_R2(y_pred,y_test):
        ## y_pred_s_batch: (Nsample, len_timeseries, Noutput)
        ## the relative percentage error is computed for each output
        overall_mean = np.nanmean(y_test)
        SSres = np.nanmean( (y_pred - y_test)**2 ,axis=0).mean(axis=0)
        SStot = np.nanmean( (y_test - overall_mean)**2 ,axis=0).mean(axis=0)
        R2 = 1 - SSres / SStot 
        print "<R2 testing> target 1:",R2[0],"target 2:",R2[1],"target 3:",R2[2]
        return R2


def reshape_batch_input(X_t,y_t=None):
    X_t = np.array(X_t).reshape(1,1,len(X_t)) ## (1,1,4) dimention
    if y_t is not None:
        y_t = np.array([y_t]) ## (1,3)
    return X_t,y_t
def fit_stateful(model,X_train,y_train,X_test,y_test,nb_epoch=8):
    '''
    reference: http://philipperemy.github.io/keras-stateful-lstm/

    X_train: (N_time_series, len_time_series, N_features) = (10,000, 3,600 (max), 2), 
    y_train: (N_time_series, len_time_series, N_output) =   (10,000, 3,600 (max), 4)

    '''
    max_len = X_train.shape[1]

    print "X_train.shape(Nsequence =",X_train.shape[0],"len_timeseries =",X_train.shape[1],"Nfeats =",X_train.shape[2],")"
    print "y_train.shape(Nsequence =",y_train.shape[0],"len_timeseries =",y_train.shape[1],"Ntargets =",y_train.shape[2],")"
    print('Train...')
    for epoch in range(nb_epoch):
        print('___________________________________')
        print "epoch", epoch+1, "out of ",nb_epoch
        ## ---------- ##
        ##  training  ##
        ## ---------- ##
        mean_tr_acc = []
        mean_tr_loss = []
        for s in range(X_train.shape[0]):
            for t in range(max_len):
                X_st = X_train[s][t]
                y_st = y_train[s][t]
                if np.any(np.isnan(y_st)):
                    break
                X_st,y_st = reshape_batch_input(X_st,y_st)
                tr_loss, tr_acc = model.train_on_batch(X_st,y_st)
                mean_tr_acc.append(tr_acc)
                mean_tr_loss.append(tr_loss)
            model.reset_states()

        ##print('accuracy training = '.format(np.mean(mean_tr_acc)))
        print('<loss (mse) training> '.format(np.mean(mean_tr_loss)))
        ## ---------- ##
        ##  testing   ##
        ## ---------- ##
        y_pred = predict_stateful(model,X_test)
        eva =  get_R2(y_pred,y_test)
    return model, eva, y_pred

def predict_stateful(model,X_test):
    y_pred = []
    max_len = X_test.shape[1]
    for s in range(X_test.shape[0]):
        y_s_pred = []
        for t in range(max_len):
            X_st = X_test[s][t]
            if np.any(np.isnan(X_st)):
                ## the rest of y is NA
                y_s_pred.extend([np.NaN]*(max_len-len(y_s_pred)))
                break
            X_st,_ = reshape_batch_input(X_st)
            y_st_pred = model.predict_on_batch(X_st)
            y_s_pred.append(y_st_pred[0].tolist())

        y_pred.append(y_s_pred)
        model.reset_states()

    y_pred = np.array(y_pred)
    return y_pred




  model, train_metric, y_pred = fit_stateful(model,
                                        X_train_s,y_train_s,
                                        X_test_s,y_test_s,nb_epoch=15)

输出如下：

X_train.shape(Nsequence = 15 len_timeseries = 3000 Nfeats = 4 )
y_train.shape(Nsequence = 15 len_timeseries = 3000 Ntargets = 3 )
Train...
___________________________________
epoch 1 out of  15
<loss (mse) training> 0.414115458727
<R2 testing> target 1: 0.664464304688 target 2: -0.574523052322 target 3: 0.526447813052
___________________________________
epoch 2 out of  15
<loss (mse) training> 0.394549429417
<R2 testing> target 1: 0.361516087033 target 2: -0.724583671831 target 3: 0.795566178787
___________________________________
epoch 3 out of  15
<loss (mse) training> 0.403199136257
<R2 testing> target 1: 0.09610702779 target 2: -0.468219774909 target 3: 0.69419269042
___________________________________
epoch 4 out of  15
<loss (mse) training> 0.406423777342
<R2 testing> target 1: 0.469149270848 target 2: -0.725592048946 target 3: 0.732963522766
___________________________________
epoch 5 out of  15
<loss (mse) training> 0.408153116703
<R2 testing> target 1: 0.400821776652 target 2: -0.329415365214 target 3: 0.2578432553
___________________________________
epoch 6 out of  15
<loss (mse) training> 0.421062678099
<R2 testing> target 1: -0.100464591586 target 2: -0.232403824523 target 3: 0.570606489959
___________________________________
epoch 7 out of  15
<loss (mse) training> 0.417774856091
<R2 testing> target 1: 0.320094445321 target 2: -0.606375769083 target 3: 0.349876223119
___________________________________
epoch 8 out of  15
<loss (mse) training> 0.427440851927
<R2 testing> target 1: 0.489543715713 target 2: -0.445328806611 target 3: 0.236463139804
___________________________________
epoch 9 out of  15
<loss (mse) training> 0.422931671143
<R2 testing> target 1: -0.31006468223 target 2: -0.322621276474 target 3: 0.122573123871
___________________________________
epoch 10 out of  15
<loss (mse) training> 0.43609803915
<R2 testing> target 1: 0.459111316554 target 2: -0.313382405804 target 3: 0.636854743292
___________________________________
epoch 11 out of  15
<loss (mse) training> 0.433844655752
<R2 testing> target 1: -0.0161015052703 target 2: -0.237462995323 target 3: 0.271788109459
___________________________________
epoch 12 out of  15
<loss (mse) training> 0.437297314405
<R2 testing> target 1: -0.493665758658 target 2: -0.234236263092 target 3: 0.047264439493
___________________________________
epoch 13 out of  15
<loss (mse) training> 0.470605045557
<R2 testing> target 1: 0.144443089961 target 2: -0.333210874982 target 3: -0.00432615142135
___________________________________
epoch 14 out of  15
<loss (mse) training> 0.444566756487
<R2 testing> target 1: -0.053982119103 target 2: -0.0676577449316 target 3: -0.12678037186
___________________________________
epoch 15 out of  15
<loss (mse) training> 0.482106208801
<R2 testing> target 1: 0.208482181828 target 2: -0.402982670798 target 3: 0.366757778713

如你所见，训练损失并没有减少！！

由于目标时间序列 1 和 3 与输入时间序列的关系非常简单（y1[t] = x1[t-2] , y3[t] = x4[t-3]），我希望完美预测性能。但是，在每个 epoch 测试 R2 表明情况并非如此。最后一个时期的 R2 大约为 0.2 和 0.36。显然，该算法没有收敛。我对这个结果感到非常困惑。请让我知道我缺少什么，以及为什么算法没有收敛。

【问题讨论】：

通常发生这种情况时，超参数有问题。您是否考虑过通过hyperopt 包或hyperas 包装器进行一些超参数优化？ 【参考方案1】：

初始说明。如果时间序列很短（例如 T = 30），我们就不需要有状态 LSTM，而经典 LSTM 可以很好地工作。在 OP 问题中，时间序列长度为 T=3000，因此使用经典 LSTM 学习可能会非常缓慢。通过将时间序列切割成片段并使用有状态的 LSTM，学习将得到改善。

N=batch_size 的有状态模式。 有状态模型对 Keras 来说很棘手，因为您需要小心如何切割时间序列和选择批量大小。在 OP 问题中，样本量为 N=100。由于我们可以接受批量为 100 的模型训练（这不是一个很大的数字），我们将选择 batch_size=100。

选择 batch_size=N 简化了训练，因为您不需要在 epoch 内重置状态（因此无需编写回调 on_batch_begin）。

仍然是切割时间序列的问题。切割有点技术性，所以我编写了一个适用于所有情况的包装函数。

def stateful_cut(arr, batch_size, T_after_cut):
    if len(arr.shape) != 3:
        # N: Independent sample size,
        # T: Time length,
        # m: Dimension
        print("ERROR: please format arr as a (N, T, m) array.")

    N = arr.shape[0]
    T = arr.shape[1]

    # We need T_after_cut * nb_cuts = T
    nb_cuts = int(T / T_after_cut)
    if nb_cuts * T_after_cut != T:
        print("ERROR: T_after_cut must divide T")

    # We need batch_size * nb_reset = N
    # If nb_reset = 1, we only reset after the whole epoch, so no need to reset
    nb_reset = int(N / batch_size)
    if nb_reset * batch_size != N:
        print("ERROR: batch_size must divide N")

    # Cutting (technical)
    cut1 = np.split(arr, nb_reset, axis=0)
    cut2 = [np.split(x, nb_cuts, axis=1) for x in cut1]
    cut3 = [np.concatenate(x) for x in cut2]
    cut4 = np.concatenate(cut3)
    return(cut4)

从现在开始，训练模型变得很容易。由于 OP 示例非常简单，我们不需要额外的预处理或正则化。我描述了如何一步一步进行（对于不耐烦的人，完整的自包含代码在这篇文章的最后提供）。

首先我们加载数据并使用包装函数对其进行整形。

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM, TimeDistributed
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

##
# Data
##
N = X_train.shape[0] # size of samples
T = X_train.shape[1] # length of each time series
batch_size = N # number of time series considered together: batch_size | N
T_after_cut = 100 # length of each cut part of the time series: T_after_cut | T
dim_in = X_train.shape[2] # dimension of input time series
dim_out = y_train.shape[2] # dimension of output time series

inputs, outputs, inputs_test, outputs_test = \
  [stateful_cut(arr, batch_size, T_after_cut) for arr in \
  [X_train, y_train, X_test, y_test]]

然后我们编译一个具有 4 个输入、3 个输出和 1 个包含 10 个节点的隐藏层的模型。

##
# Model
##
nb_units = 10

model = Sequential()
model.add(LSTM(batch_input_shape=(batch_size, None, dim_in),
               return_sequences=True, units=nb_units, stateful=True))
model.add(TimeDistributed(Dense(activation='linear', units=dim_out)))
model.compile(loss = 'mse', optimizer = 'rmsprop')

我们在不重置状态的情况下训练模型。我们之所以能这样做，是因为我们选择了 batch_size = N。

##
# Training
##
epochs = 100

nb_reset = int(N / batch_size)
if nb_reset > 1:
    print("ERROR: We need to reset states when batch_size < N")

# When nb_reset = 1, we do not need to reinitialize states
history = model.fit(inputs, outputs, epochs = epochs, 
                    batch_size = batch_size, shuffle=False,
                    validation_data=(inputs_test, outputs_test))

我们得到如下训练/测试损失的演变：

现在，我们定义了一个“mime 模型”，它是无状态的，但包含我们的有状态权重。【为什么会这样？通过model.predict使用有状态模型进行预测需要Keras中的一个完整batch，但我们可能没有完整batch来预测...]

## Mime model which is stateless but containing stateful weights
model_stateless = Sequential()
model_stateless.add(LSTM(input_shape=(None, dim_in),
               return_sequences=True, units=nb_units))
model_stateless.add(TimeDistributed(Dense(activation='linear', units=dim_out)))
model_stateless.compile(loss = 'mse', optimizer = 'rmsprop')
model_stateless.set_weights(model.get_weights())

最后，我们可以展示我们对长期序列 y1、y2 和 y3 的令人难以置信的预测（蓝色表示真实输出；橙色表示预测输出）：

对于 y1：

对于 y2：

对于 y3：

结论：它工作得几乎完美，除非对于第 2-3 天的系列在定义上是不可预测的。从一个批次到下一个批次时，我们没有观察到任何爆裂。

更多当N很大时，我们想选择batch_size | N with batch_size https://github.com/ahstat/deep-learning/blob/master/rnn/4_lagging_and_stateful.py（C 和 D 部分）中编写了完整代码。这条 github 路径还显示了经典 LSTM 对于短时间序列（A 部分）的效率，以及对于长时间序列（B 部分）的低效率。我在这里写了一篇博文，详细介绍了如何使用 Keras 进行时间序列预测：https://ahstat.github.io/RNN-Keras-time-series/。

完整的自包含代码

################
# Code from OP #
################
import numpy as np
def random_sample(len_timeseries=3000):
    Nchoice = 600
    x1 = np.cos(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    x2 = np.cos(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    x3 = np.sin(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    x4 = np.sin(np.arange(0,len_timeseries)/float(1.0 + np.random.choice(Nchoice)))
    y1 = np.random.random(len_timeseries)
    y2 = np.random.random(len_timeseries)
    y3 = np.random.random(len_timeseries)
    for t in range(3,len_timeseries):
        ## the output time series depend on input as follows: 
        y1[t] = x1[t-2] 
        y2[t] = x2[t-1]*x3[t-2]
        y3[t] = x4[t-3]
    y = np.array([y1,y2,y3]).T
    X = np.array([x1,x2,x3,x4]).T
    return y, X
def generate_data(Nsequence = 1000):
    X_train = []
    y_train = []
    for isequence in range(Nsequence):
        y, X = random_sample()
        X_train.append(X)
        y_train.append(y)
    return np.array(X_train),np.array(y_train)

Nsequence = 100
prop = 0.5
Ntrain = int(Nsequence*prop)
X, y = generate_data(Nsequence)
X_train = X[:Ntrain,:,:]
X_test  = X[Ntrain:,:,:]
y_train = y[:Ntrain,:,:]
y_test  = y[Ntrain:,:,:] 

#X.shape = (N sequence, length of time series, N input features)
#y.shape = (N sequence, length of time series, N targets)
print(X.shape, y.shape)
# (100, 3000, 4) (100, 3000, 3)

####################
# Cutting function #
####################
def stateful_cut(arr, batch_size, T_after_cut):
    if len(arr.shape) != 3:
        # N: Independent sample size,
        # T: Time length,
        # m: Dimension
        print("ERROR: please format arr as a (N, T, m) array.")

    N = arr.shape[0]
    T = arr.shape[1]

    # We need T_after_cut * nb_cuts = T
    nb_cuts = int(T / T_after_cut)
    if nb_cuts * T_after_cut != T:
        print("ERROR: T_after_cut must divide T")

    # We need batch_size * nb_reset = N
    # If nb_reset = 1, we only reset after the whole epoch, so no need to reset
    nb_reset = int(N / batch_size)
    if nb_reset * batch_size != N:
        print("ERROR: batch_size must divide N")

    # Cutting (technical)
    cut1 = np.split(arr, nb_reset, axis=0)
    cut2 = [np.split(x, nb_cuts, axis=1) for x in cut1]
    cut3 = [np.concatenate(x) for x in cut2]
    cut4 = np.concatenate(cut3)
    return(cut4)

#############
# Main code #
#############
from keras.models import Sequential
from keras.layers import Dense, LSTM, TimeDistributed
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

##
# Data
##
N = X_train.shape[0] # size of samples
T = X_train.shape[1] # length of each time series
batch_size = N # number of time series considered together: batch_size | N
T_after_cut = 100 # length of each cut part of the time series: T_after_cut | T
dim_in = X_train.shape[2] # dimension of input time series
dim_out = y_train.shape[2] # dimension of output time series

inputs, outputs, inputs_test, outputs_test = \
  [stateful_cut(arr, batch_size, T_after_cut) for arr in \
  [X_train, y_train, X_test, y_test]]

##
# Model
##
nb_units = 10

model = Sequential()
model.add(LSTM(batch_input_shape=(batch_size, None, dim_in),
               return_sequences=True, units=nb_units, stateful=True))
model.add(TimeDistributed(Dense(activation='linear', units=dim_out)))
model.compile(loss = 'mse', optimizer = 'rmsprop')

##
# Training
##
epochs = 100

nb_reset = int(N / batch_size)
if nb_reset > 1:
    print("ERROR: We need to reset states when batch_size < N")

# When nb_reset = 1, we do not need to reinitialize states
history = model.fit(inputs, outputs, epochs = epochs, 
                    batch_size = batch_size, shuffle=False,
                    validation_data=(inputs_test, outputs_test))

def plotting(history):
    plt.plot(history.history['loss'], color = "red")
    plt.plot(history.history['val_loss'], color = "blue")
    red_patch = mpatches.Patch(color='red', label='Training')
    blue_patch = mpatches.Patch(color='blue', label='Test')
    plt.legend(handles=[red_patch, blue_patch])
    plt.xlabel('Epochs')
    plt.ylabel('MSE loss')
    plt.show()

plt.figure(figsize=(10,8))
plotting(history) # Evolution of training/test loss

##
# Visual checking for a time series
##
## Mime model which is stateless but containing stateful weights
model_stateless = Sequential()
model_stateless.add(LSTM(input_shape=(None, dim_in),
               return_sequences=True, units=nb_units))
model_stateless.add(TimeDistributed(Dense(activation='linear', units=dim_out)))
model_stateless.compile(loss = 'mse', optimizer = 'rmsprop')
model_stateless.set_weights(model.get_weights())

## Prediction of a new set
i = 0 # time series selected (between 0 and N-1)
x = X_train[i]
y = y_train[i]
y_hat = model_stateless.predict(np.array([x]))[0]

for dim in range(3): # dim = 0 for y1 ; dim = 1 for y2 ; dim = 2 for y3.
    plt.figure(figsize=(10,8))
    plt.plot(range(T), y[:,dim])
    plt.plot(range(T), y_hat[:,dim])
    plt.show()

## Conclusion: works almost perfectly.

【讨论】：

我正在解决类似的问题。我有一堆商品（超过 5000 件商品）的销售额，并想预测未来的销售额。在我的情况下，时间序列的长度不会相同（仅从某个日期开始销售的物品）。这种解决方案可以在这种情况下工作吗？这个解决方案可以通过输入不等长度的时间序列来工作；不过我会从更简单的模型开始，然后尝试了解 RNN 模型是否可以从数据中捕获更多信息 @ahstat 非常感谢您提供非常详细的代码。我一直在寻找这个（数据准备和预测）。我可以使用许多不同序列长度的时间序列来训练有状态吗？（我在想，对于较短的序列，我可以使用 pad、mask 来使长度相同，对吗？）。 2.) 另外，当我们使用无状态进行预测时，您能否告诉我如何获得前一批的状态 @ahstat 感谢您的工作！您能否详细说明如何将其用于不等长度的时间序列？我的 X.shape 是 (100, different-lenghts, 39) .. 基本上我的输入序列都有不同的长度.. （还有我的输出）

以上是关于带有 LSTM 单元的 Keras RNN 用于基于多个输入时间序列预测多个输出时间序列的主要内容，如果未能解决你的问题，请参考以下文章