PyTorch LSTM 中的“隐藏”和“输出”有啥区别?
Posted
技术标签:
【中文标题】PyTorch LSTM 中的“隐藏”和“输出”有啥区别?【英文标题】:What's the difference between "hidden" and "output" in PyTorch LSTM?PyTorch LSTM 中的“隐藏”和“输出”有什么区别? 【发布时间】:2018-06-26 10:25:39 【问题描述】:我无法理解 PyTorch 的 LSTM 模块(以及类似的 RNN 和 GRU)的文档。关于输出,它说:
输出:输出,(h_n, c_n)
output (seq_len, batch, hidden_size * num_directions):包含来自RNN最后一层的输出特征(h_t)的张量,对于每个t。如果将 torch.nn.utils.rnn.PackedSequence 作为输入,则输出也将是打包序列。 h_n (num_layers * num_directions, batch, hidden_size):包含 t=seq_len 的隐藏状态的张量 c_n (num_layers * num_directions, batch, hidden_size):包含 t=seq_len 的单元状态的张量
变量output
和h_n
似乎都给出了隐藏状态的值。 h_n
是否只是冗余地提供了已包含在 output
中的最后一个时间步,还是有更多的东西?
【问题讨论】:
【参考方案1】:我做了一个图表。名称跟在PyTorch docs 后面,虽然我将num_layers
重命名为w
。
output
包含最后一层中的所有隐藏状态(“最后一个”深度方向,而不是时间方向)。 (h_n, c_n)
包含最后一个时间步之后的隐藏状态,t = n,因此您可以将它们输入另一个 LSTM。
不包括批次维度。
【讨论】:
太好了,谢谢,这很有意义,真的很有帮助。那么这意味着,例如,除了最后一个时间步之外,没有办法在一个时间步获取所有层的隐藏值? 对,除非您有带有num_layers = 1
的单个 LSTM,它们将前一个网络的输出作为输入。
@nnnmmm 所以,每个(蓝色)框都是一个 LSTM/RNN/GRU 单元,对吧? h_i
和 c_i
分别是隐藏和单元状态,w
是我们网络的深度,对吧?
@kmario23:是的,每个蓝色框都是一个 LSTM 单元。据我了解,vanilla RNN 和 GRU 没有单元状态,只有隐藏状态,因此它们看起来会有些不同。关于h_i
、c_i
和w
,你是对的。
这比 Pytorch 的官方文档要清晰得多。他们应该包括这张照片。非常感谢。惊人的。现在我完全理解了输出的含义。【参考方案2】:
输出状态是RNN(LSTM)中每个时间步的所有隐藏状态的张量,RNN(LSTM)返回的隐藏状态是输入序列最后一个时间步的最后一个隐藏状态.您可以通过收集每个步骤的所有隐藏状态并将其与输出状态进行比较来检查这一点(前提是您没有使用 pack_padded_sequence)。
【讨论】:
【参考方案3】:这实际上取决于您使用的模型以及您将如何解释该模型。输出可能是:
单个 LSTM 单元隐藏状态 几个 LSTM 单元隐藏状态 所有隐藏状态输出输出,几乎从不直接解释。如果输入经过编码,则应该有一个 softmax 层来解码结果。
注意:在语言建模中,隐藏状态用于定义下一个单词的概率,p(wt+1|w1,...,w t) =softmax(Wht+b).
【讨论】:
【参考方案4】:我刚刚使用代码验证了其中的一些,如果它是深度为 1 的 LSTM,那么 h_n 与“输出”的最后一个值相同,这确实是正确的。 (这对于 > 1 深度的 LSTM 来说是不正确的,尽管正如上面 @nnnmmm 所解释的那样)
所以,基本上我们应用LSTM后得到的“输出”与文档中定义的o_t不同,而是h_t。
import torch
import torch.nn as nn
torch.manual_seed(0)
model = nn.LSTM( input_size = 1, hidden_size = 50, num_layers = 1 )
x = torch.rand( 50, 1, 1)
output, (hn, cn) = model(x)
现在可以检查output[-1]
和hn
是否具有相同的值,如下所示
tensor([[ 0.1140, -0.0600, -0.0540, 0.1492, -0.0339, -0.0150, -0.0486, 0.0188,
0.0504, 0.0595, -0.0176, -0.0035, 0.0384, -0.0274, 0.1076, 0.0843,
-0.0443, 0.0218, -0.0093, 0.0002, 0.1335, 0.0926, 0.0101, -0.1300,
-0.1141, 0.0072, -0.0142, 0.0018, 0.0071, 0.0247, 0.0262, 0.0109,
0.0374, 0.0366, 0.0017, 0.0466, 0.0063, 0.0295, 0.0536, 0.0339,
0.0528, -0.0305, 0.0243, -0.0324, 0.0045, -0.1108, -0.0041, -0.1043,
-0.0141, -0.1222]], grad_fn=<SelectBackward>)
【讨论】:
【参考方案5】:在 Pytorch 中,输出参数给出了 LSTM 堆栈最后一层中每个单独 LSTM 单元的输出,而隐藏状态和单元状态给出了 LSTM 堆栈中每一层中每个隐藏单元和单元状态的输出。
import torch.nn as nn
torch.manual_seed(1)
inputs = [torch.randn(1, 3) for _ in range(5)] # indicates that there are 5 sequences to be given as inputs and (1,3) indicates that there is 1 layer with 3 cells
hidden = (torch.randn(1, 1, 3),
torch.randn(1, 1, 3)) #initializing h and c values to be of dimensions (1, 1, 3) which indicates there is (1 * 1) - num_layers * num_directions, with batch size of 1 and projection size of 3.
#Since there is only 1 batch in input, h and c can also have only one batch of data for initialization and the number of cells in both input and output should also match.
lstm = nn.LSTM(3, 3) #implying both input and output are 3 dimensional data
for i in inputs:
out, hidden = lstm(i.view(1, 1, -1), hidden)
print('out:', out)
print('hidden:', hidden)
输出
out: tensor([[[-0.1124, -0.0653, 0.2808]]], grad_fn=<StackBackward>)
hidden: (tensor([[[-0.1124, -0.0653, 0.2808]]], grad_fn=<StackBackward>), tensor([[[-0.2883, -0.2846, 2.0720]]], grad_fn=<StackBackward>))
out: tensor([[[ 0.1675, -0.0376, 0.4402]]], grad_fn=<StackBackward>)
hidden: (tensor([[[ 0.1675, -0.0376, 0.4402]]], grad_fn=<StackBackward>), tensor([[[ 0.4394, -0.1226, 1.5611]]], grad_fn=<StackBackward>))
out: tensor([[[0.3699, 0.0150, 0.1429]]], grad_fn=<StackBackward>)
hidden: (tensor([[[0.3699, 0.0150, 0.1429]]], grad_fn=<StackBackward>), tensor([[[0.8432, 0.0618, 0.9413]]], grad_fn=<StackBackward>))
out: tensor([[[0.1795, 0.0296, 0.2957]]], grad_fn=<StackBackward>)
hidden: (tensor([[[0.1795, 0.0296, 0.2957]]], grad_fn=<StackBackward>), tensor([[[0.4541, 0.1121, 0.9320]]], grad_fn=<StackBackward>))
out: tensor([[[0.1365, 0.0596, 0.3931]]], grad_fn=<StackBackward>)
hidden: (tensor([[[0.1365, 0.0596, 0.3931]]], grad_fn=<StackBackward>), tensor([[[0.3430, 0.1948, 1.0255]]], grad_fn=<StackBackward>))
多层 LSTM
import torch.nn as nn
torch.manual_seed(1)
num_layers = 2
inputs = [torch.randn(1, 3) for _ in range(5)]
hidden = (torch.randn(2, 1, 3),
torch.randn(2, 1, 3))
lstm = nn.LSTM(input_size=3, hidden_size=3, num_layers=2)
for i in inputs:
# Step through the sequence one element at a time.
# after each step, hidden contains the hidden state.
out, hidden = lstm(i.view(1, 1, -1), hidden)
print('out:', out)
print('hidden:', hidden)
输出
out: tensor([[[-0.0819, 0.1214, -0.2586]]], grad_fn=<StackBackward>)
hidden: (tensor([[[-0.2625, 0.4415, -0.4917]],
[[-0.0819, 0.1214, -0.2586]]], grad_fn=<StackBackward>), tensor([[[-2.5740, 0.7832, -0.9211]],
[[-0.2803, 0.5175, -0.5330]]], grad_fn=<StackBackward>))
out: tensor([[[-0.1298, 0.2797, -0.0882]]], grad_fn=<StackBackward>)
hidden: (tensor([[[-0.3818, 0.3306, -0.3020]],
[[-0.1298, 0.2797, -0.0882]]], grad_fn=<StackBackward>), tensor([[[-2.3980, 0.6347, -0.6592]],
[[-0.3643, 0.9301, -0.1326]]], grad_fn=<StackBackward>))
out: tensor([[[-0.1630, 0.3187, 0.0728]]], grad_fn=<StackBackward>)
hidden: (tensor([[[-0.5612, 0.3134, -0.0782]],
[[-0.1630, 0.3187, 0.0728]]], grad_fn=<StackBackward>), tensor([[[-1.7555, 0.6882, -0.3575]],
[[-0.4571, 1.2094, 0.1061]]], grad_fn=<StackBackward>))
out: tensor([[[-0.1723, 0.3274, 0.1546]]], grad_fn=<StackBackward>)
hidden: (tensor([[[-0.5112, 0.1597, -0.0901]],
[[-0.1723, 0.3274, 0.1546]]], grad_fn=<StackBackward>), tensor([[[-1.4417, 0.5892, -0.2489]],
[[-0.4940, 1.3620, 0.2255]]], grad_fn=<StackBackward>))
out: tensor([[[-0.1847, 0.2968, 0.1333]]], grad_fn=<StackBackward>)
hidden: (tensor([[[-0.3256, 0.3217, -0.1899]],
[[-0.1847, 0.2968, 0.1333]]], grad_fn=<StackBackward>), tensor([[[-1.7925, 0.6096, -0.4432]],
[[-0.5147, 1.4031, 0.2014]]], grad_fn=<StackBackward>))
双向多层 LSTM
import torch.nn as nn
torch.manual_seed(1)
num_layers = 2
is_bidirectional = True
inputs = [torch.randn(1, 3) for _ in range(5)]
hidden = (torch.randn(4, 1, 3),
torch.randn(4, 1, 3)) #4 -> (2 * 2) -> num_layers * num_directions
lstm = nn.LSTM(input_size=3, hidden_size=3, num_layers=2, bidirectional=is_bidirectional)
for i in inputs:
# Step through the sequence one element at a time.
# after each step, hidden contains the hidden state.
out, hidden = lstm(i.view(1, 1, -1), hidden)
print('out:', out)
print('hidden:', hidden)
# output dim -> (seq_len, batch, num_directions * hidden_size) -> (5, 1, 2*3)
# hidden dim -> (num_layers * num_directions, batch, hidden_size) -> (2 * 2, 1, 3)
# cell state dim -> (num_layers * num_directions, batch, hidden_size) -> (2 * 2, 1, 3)
输出
out: tensor([[[-0.4620, 0.1115, -0.1087, 0.1646, 0.0173, -0.2196]]],
grad_fn=<CatBackward>)
hidden: (tensor([[[ 0.5187, 0.2656, -0.2543]],
[[ 0.4175, 0.0539, 0.0633]],
[[-0.4620, 0.1115, -0.1087]],
[[ 0.1646, 0.0173, -0.2196]]], grad_fn=<StackBackward>), tensor([[[ 1.1546, 0.4012, -0.4119]],
[[ 0.7999, 0.2632, 0.2587]],
[[-1.4196, 0.2075, -0.3148]],
[[ 0.6605, 0.0243, -0.5783]]], grad_fn=<StackBackward>))
out: tensor([[[-0.1860, 0.1359, -0.2719, 0.0815, 0.0061, -0.0980]]],
grad_fn=<CatBackward>)
hidden: (tensor([[[ 0.2945, 0.0842, -0.1580]],
[[ 0.2766, -0.1873, 0.2416]],
[[-0.1860, 0.1359, -0.2719]],
[[ 0.0815, 0.0061, -0.0980]]], grad_fn=<StackBackward>), tensor([[[ 0.5453, 0.1281, -0.2497]],
[[ 0.9706, -0.3592, 0.4834]],
[[-0.3706, 0.2681, -0.6189]],
[[ 0.2029, 0.0121, -0.3028]]], grad_fn=<StackBackward>))
out: tensor([[[ 0.1095, 0.1520, -0.3238, 0.0283, 0.0387, -0.0820]]],
grad_fn=<CatBackward>)
hidden: (tensor([[[ 0.1427, 0.0859, -0.2926]],
[[ 0.1536, -0.2343, 0.0727]],
[[ 0.1095, 0.1520, -0.3238]],
[[ 0.0283, 0.0387, -0.0820]]], grad_fn=<StackBackward>), tensor([[[ 0.2386, 0.1646, -0.4102]],
[[ 0.2636, -0.4828, 0.1889]],
[[ 0.1967, 0.2848, -0.7155]],
[[ 0.0735, 0.0702, -0.2859]]], grad_fn=<StackBackward>))
out: tensor([[[ 0.2346, 0.1576, -0.4006, -0.0053, 0.0256, -0.0653]]],
grad_fn=<CatBackward>)
hidden: (tensor([[[ 0.1706, 0.0147, -0.0341]],
[[ 0.1835, -0.3951, 0.2506]],
[[ 0.2346, 0.1576, -0.4006]],
[[-0.0053, 0.0256, -0.0653]]], grad_fn=<StackBackward>), tensor([[[ 0.3422, 0.0269, -0.0475]],
[[ 0.4235, -0.9144, 0.5655]],
[[ 0.4589, 0.2807, -0.8332]],
[[-0.0133, 0.0507, -0.1996]]], grad_fn=<StackBackward>))
out: tensor([[[ 0.2774, 0.1639, -0.4460, -0.0228, 0.0086, -0.0369]]],
grad_fn=<CatBackward>)
hidden: (tensor([[[ 0.2147, -0.0191, 0.0677]],
[[ 0.2516, -0.4591, 0.3327]],
[[ 0.2774, 0.1639, -0.4460]],
[[-0.0228, 0.0086, -0.0369]]], grad_fn=<StackBackward>), tensor([[[ 0.4414, -0.0299, 0.0889]],
[[ 0.6360, -1.2360, 0.7229]],
[[ 0.5692, 0.2843, -0.9375]],
[[-0.0569, 0.0177, -0.1039]]], grad_fn=<StackBackward>))
【讨论】:
以上是关于PyTorch LSTM 中的“隐藏”和“输出”有啥区别?的主要内容,如果未能解决你的问题,请参考以下文章
pytorch中LSTM的输出的理解,以及batch_first=True or False的输出层的区别
如何在pytorch LSTM中自定义多个隐藏层单元的数量?
PyTorch搭建GNN-LSTM和LSTM-GNN模型实现多变量输入多变量输出时间序列预测
PyTorch笔记 - LSTM(Long Short-Term Memory) 和 LSTMP(Projection)