如何使用 LSTM 单元训练 RNN 以进行时间序列预测

Posted 2023-02-16

技术标签:

【中文标题】如何使用 LSTM 单元训练 RNN 以进行时间序列预测【英文标题】：How to train a RNN with LSTM cells for time series prediction 【发布时间】：2016-06-27 22:30:09 【问题描述】：

我目前正在尝试构建一个简单的模型来预测时间序列。目标是使用序列训练模型，以便模型能够预测未来值。

我正在使用 tensorflow 和 lstm 单元来执行此操作。该模型通过时间截断反向传播进行训练。我的问题是如何构建训练数据。

例如，假设我们想学习给定的序列：

[1,2,3,4,5,6,7,8,9,10,11,...]

我们为num_steps=4展开网络。

选项 1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...

选项 2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...

选项 3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...

选项 4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...

任何帮助将不胜感激。

【问题讨论】：

在列出的选项中，在我看来，选项 3 将是最合理的选项，如果您确实可以假设 4 个过去的值足以在很大程度上近似地预测现值 (所以它更多的是关于数据，而不是你用于预测的特定方法）.. 当然我使用的值超过了过去的 4 个值，这只是一个便于演示的小例子。也可以随意提出除上述 4 个选项之外的其他选项。 【参考方案1】：

我即将在 TensorFlow 中学习 LSTM，并尝试实现一个示例，该示例（幸运的是）尝试预测由简单数学函数生成的一些时间序列/数字序列。

但出于Unsupervised Learning of Video Representations using LSTMs 的推动，我使用了一种不同的方式来构建训练数据：

LSTM Future Predictor Model

选项 5：

input data               label     
1,2,3,4                  5,6,7,8
2,3,4,5                  6,7,8,9
3,4,5,6                  7,8,9,10
...

除了本文，我（尝试）从给定的 TensorFlow RNN 示例中获取灵感。我目前的完整解决方案如下所示：

import math
import random
import numpy as np
import tensorflow as tf

LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4


def ground_truth_func(i, j, t):
    return i * math.pow(t, 2) + j


def get_batch(batch_size):
    seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
    tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)

    for b in xrange(batch_size):
        i = float(random.randint(-25, 25))
        j = float(random.randint(-100, 100))
        for t in xrange(NUM_T_STEPS):
            value = ground_truth_func(i, j, t)
            seq[b, t, 0] = value

        for t in xrange(NUM_T_STEPS):
            tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
    return seq, tgt


# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])

fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))

# ENCODER
with tf.variable_scope('ENC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
    state = initial_state
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)

learned_representation = state

# DECODER
with tf.variable_scope('DEC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    state = learned_representation
    logits_stacked = None
    loss = 0.0
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)
        # output can be used to make next number prediction
        logits = tf.matmul(output, fc1_weight) + fc1_bias

        if logits_stacked is None:
            logits_stacked = logits
        else:
            logits_stacked = tf.concat(1, [logits_stacked, logits])

        loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE

reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))

train = tf.train.AdamOptimizer().minimize(reg_loss)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    total_loss = 0.0
    for step in xrange(MAX_STEPS):
        seq_batch, target_batch = get_batch(BATCH_SIZE)

        feed = sequence: seq_batch, target: target_batch
        _, current_loss = sess.run([train, reg_loss], feed)
        if step % 10 == 0:
            print("@: ".format(step, current_loss))
        total_loss += current_loss

    print('Total loss:', total_loss)

    print('### SIMPLE EVAL: ###')
    seq_batch, target_batch = get_batch(BATCH_SIZE)
    feed = sequence: seq_batch, target: target_batch
    prediction = sess.run([logits_stacked], feed)
    for b in xrange(BATCH_SIZE):
        print(" -> )".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
        print(" `-> Prediction: ".format(prediction[0][b]))

此示例输出如下所示：

### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
#  `-> Prediction: [model prediction]  
[  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
 `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
[ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
 `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
[ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
 `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
...

该模型是一个 LSTM 自动编码器，每层有 2 层。

不幸的是，正如您在结果中看到的那样，此模型无法正确学习序列。我可能只是在某个地方犯了一个严重的错误，或者 1000-10000 个训练步骤对于 LSTM 来说只是少数。正如我所说，我也刚刚开始正确理解/使用 LSTM。但希望这能给您一些关于实施的启发。

【讨论】：

我目前正在使用选项 2 并取得了一些成功。让我质疑你的方法的是，模型没有按顺序“看到”数据。据我了解，网络的内部状态受到模型迄今为止“看到”的所有值的影响。因此，如果您开始一个新序列，您必须重置内部状态。在您提供数据的表单中，模型会在数据中看到很多重复。但我可能是错的，我还不确定。感谢您的提示。我从没想过要为每个要学习的新序列重置状态。我会在今天晚些时候检查一下。另外，我看到我在 Decoder-LSTM 中犯了一个错误：在这里，我不小心使用了与 Encoder-LSTM 中相同的输入序列，这是错误的。我这里想做的就是用最后一个LSTM-Cell（t-1）的输出作为当前cell（t）的输入。我只是检查一下。在上面发布的代码中，初始状态在每次迭代中都是零张量。所以，应该没问题。尽管如此，我不知道为什么它仍然没有学到任何有用的东西...... @bsautermeister，你有没有得到任何地方？我正在考虑做几乎相同的事情，但是那里的内容太多了，我迷路了。 @GLaDER 是的，我做到了！我在我的硕士论文项目中使用了这样的编码器-解码器架构来进行视频帧预测：bsautermeister.de/research/frame-prediction 在那里，您还可以找到源代码的链接。【参考方案2】：

在阅读了几个 LSTM 介绍博客后，例如Jakob Aungiers'，选项 3 似乎是无状态 LSTM 的正确选项。

如果您的 LSTM 需要记住比 num_steps 更早的数据，您可以以有状态的方式进行训练 - 有关 Keras 示例，请参阅 Philippe Remy's blog post "Stateful LSTM in Keras"。但是，Philippe 没有展示批量大小大于 1 的示例。我想在你的情况下，有状态 LSTM 的批量大小为 4 可以与以下数据一起使用（写为input -> label）：

batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8

batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12

batch #2:
9,10,11,12 -> 13
...

由此，例如的状态第 0 批中的第二个样本被正确地重复使用以继续使用第 1 批中的第二个样本进行训练。

这在某种程度上类似于您的选项 4，但是您没有使用那里的所有可用标签。

更新：

在我的建议中，batch_size 等于 num_steps，Alexis Huet gives an answer 对于batch_size 是num_steps 的除数，可以用于更大的num_steps。他在他的博客上describes it nicely。

【讨论】：

答案***.com/a/48588730/1389680 支持我关于使用多样本批次进行有状态训练的建议。【参考方案3】：

我相信选项 1 最接近 /tensorflow/models/rnn/ptb/reader.py 中的参考实现

def ptb_iterator(raw_data, batch_size, num_steps):
  """Iterate on the raw PTB data.

  This generates batch_size pointers into the raw PTB data, and allows
  minibatch iteration along these pointers.

  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.

  Yields:
    Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
    The second element of the tuple is the same data time-shifted to the
    right by one.

  Raises:
    ValueError: if batch_size or num_steps are too high.
  """
  raw_data = np.array(raw_data, dtype=np.int32)

  data_len = len(raw_data)
  batch_len = data_len // batch_size
  data = np.zeros([batch_size, batch_len], dtype=np.int32)
  for i in range(batch_size):
    data[i] = raw_data[batch_len * i:batch_len * (i + 1)]

  epoch_size = (batch_len - 1) // num_steps

  if epoch_size == 0:
    raise ValueError("epoch_size == 0, decrease batch_size or num_steps")

  for i in range(epoch_size):
    x = data[:, i*num_steps:(i+1)*num_steps]
    y = data[:, i*num_steps+1:(i+1)*num_steps+1]
    yield (x, y)

但是，另一种选择是为每个训练序列随机选择一个指向数据数组的指针。

【讨论】：

以上是关于如何使用 LSTM 单元训练 RNN 以进行时间序列预测的主要内容，如果未能解决你的问题，请参考以下文章