使用 pack_padded_sequence - pad_packed_sequence 时训练精度降低和损失增加
Posted
技术标签:
【中文标题】使用 pack_padded_sequence - pad_packed_sequence 时训练精度降低和损失增加【英文标题】:Training accuracy decrease and loss increase when using pack_padded_sequence - pad_packed_sequence 【发布时间】:2021-06-20 19:49:05 【问题描述】:我正在尝试使用 pack_padded_sequence 和 pad_packed_sequence 训练一个双向 lstm,但准确率不断下降,而损失增加。
这是我的数据加载器:
X1 (X[0]): tensor([[1408, 1413, 43, ..., 0, 0, 0],
[1452, 1415, 2443, ..., 0, 0, 0],
[1434, 1432, 2012, ..., 0, 0, 0],
...,
[1408, 3593, 1431, ..., 0, 0, 0],
[1408, 1413, 1402, ..., 0, 0, 0],
[1420, 1474, 2645, ..., 0, 0, 0]]), shape: torch.Size([64, 31])
len_X1 (X[3]): [9, 19, 12, 7, 7, 15, 4, 13, 9, 8, 14, 19, 7, 23, 7, 13, 7, 12, 10, 12, 13, 11, 31, 8, 20, 17, 8, 9, 9, 29, 8, 5, 5, 13, 9, 8, 10, 17, 13, 8, 8, 11, 7, 29, 15, 10, 6, 7, 10, 9, 10, 10, 4, 16, 11, 10, 16, 8, 13, 8, 8, 20, 7, 12]
X2 (X[1]): tensor([[1420, 1415, 51, ..., 0, 0, 0],
[1452, 1415, 2376, ..., 1523, 2770, 35],
[1420, 1415, 51, ..., 0, 0, 0],
...,
[1408, 3593, 1474, ..., 0, 0, 0],
[1408, 1428, 2950, ..., 0, 0, 0],
[1474, 1402, 3464, ..., 0, 0, 0]]), shape: torch.Size([64, 42])
len_X2 (X[4]): [14, 42, 13, 18, 12, 31, 8, 19, 5, 7, 15, 19, 7, 17, 6, 11, 12, 16, 8, 8, 19, 8, 12, 10, 11, 9, 9, 9, 9, 21, 7, 5, 8, 13, 14, 8, 15, 8, 8, 8, 12, 13, 7, 14, 4, 10, 6, 11, 12, 7, 8, 11, 9, 13, 30, 10, 15, 9, 9, 7, 9, 8, 7, 20]
t (X[2]): tensor([0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1,
0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1]), shape: torch.Size([64])
这是我的模型类:
class BiLSTM(nn.Module):
def __init__(self, n_vocabs, embed_dims, n_lstm_units, n_lstm_layers, n_output_classes):
super(BiLSTM, self).__init__()
self.v = n_vocabs
self.e = embed_dims
self.u = n_lstm_units
self.l = n_lstm_layers
self.o = n_output_classes
self.padd_idx = tokenizer.get_vocab()['[PAD]']
self.embed = nn.Embedding(
self.v,
self.e,
self.padd_idx
)
self.bilstm = nn.LSTM(
self.e,
self.u,
self.l,
batch_first = True,
bidirectional = True,
dropout = 0.5
)
self.linear = nn.Linear(
self.u * 4,
self.o
)
def forward(self, X):
# initial_hidden
h0 = torch.zeros(self.l * 2, X[0].size(0), self.u).to(device)
c0 = torch.zeros(self.l * 2, X[0].size(0), self.u).to(device)
# embedding
out1 = self.embed(X[0].to(device))
out2 = self.embed(X[1].to(device))
# # pack_padded_sequence
out1 = nn.utils.rnn.pack_padded_sequence(out1, X[3], batch_first=True, enforce_sorted=False)
out2 = nn.utils.rnn.pack_padded_sequence(out2, X[4], batch_first=True, enforce_sorted=False)
# NxTxh, lxNxh
out1, _ = self.bilstm(out1, (h0, c0))
out2, _ = self.bilstm(out2, (h0, c0))
# # pad_packed_sequence
out1, _ = nn.utils.rnn.pad_packed_sequence(out1, batch_first=True)
out2, _ = nn.utils.rnn.pad_packed_sequence(out2, batch_first=True)
# take only the final time step
out1 = out1[:, -1, :]
out2 = out2[:, -1, :]
# concatenate out1&2
out = torch.cat((out1, out2), 1)
# linear layer
out = self.linear(out)
iout = torch.max(out, 1)[1]
return iout, out
如果我删除 pack_padded_sequence - pad_packed_sequence,模型训练就可以正常工作:
class BiLSTM(nn.Module):
def __init__(self, n_vocabs, embed_dims, n_lstm_units, n_lstm_layers, n_output_classes):
super(BiLSTM, self).__init__()
self.v = n_vocabs
self.e = embed_dims
self.u = n_lstm_units
self.l = n_lstm_layers
self.o = n_output_classes
self.padd_idx = tokenizer.get_vocab()['[PAD]']
self.embed = nn.Embedding(
self.v,
self.e,
self.padd_idx
)
self.bilstm = nn.LSTM(
self.e,
self.u,
self.l,
batch_first = True,
bidirectional = True,
dropout = 0.5
)
self.linear = nn.Linear(
self.u * 4,
self.o
)
def forward(self, X):
# initial_hidden
h0 = torch.zeros(self.l * 2, X[0].size(0), self.u).to(device)
c0 = torch.zeros(self.l * 2, X[0].size(0), self.u).to(device)
# embedding
out1 = self.embed(X[0].to(device))
out2 = self.embed(X[1].to(device))
# pack_padded_sequence
# out1 = nn.utils.rnn.pack_padded_sequence(out1, X[3], batch_first=True, enforce_sorted=False)
# out2 = nn.utils.rnn.pack_padded_sequence(out2, X[4], batch_first=True, enforce_sorted=False)
# NxTxh, lxNxh
out1, _ = self.bilstm(out1, (h0, c0))
out2, _ = self.bilstm(out2, (h0, c0))
# pad_packed_sequence
# out1, _ = nn.utils.rnn.pad_packed_sequence(out1, batch_first=True)
# out2, _ = nn.utils.rnn.pad_packed_sequence(out2, batch_first=True)
# take only the final time step
out1 = out1[:, -1, :]
out2 = out2[:, -1, :]
# concatenate out1&2
out = torch.cat((out1, out2), 1)
# linear layer
out = self.linear(out)
iout = torch.max(out, 1)[1]
return iout, out
【问题讨论】:
【参考方案1】:您的这些代码行是错误的。
# take only the final time step
out1 = out1[:, -1, :]
out2 = out2[:, -1, :]
你说你正在走最后一个时间步,但你忘记了每个序列有不同的长度。
nn.utils.rnn.pad_packed_sequence
将填充每个序列的输出,直到它的长度等于最长的长度,以便它们都具有相同的长度。
换句话说,您正在为大多数序列切出零向量(填充)。
这应该做你想做的。
# take only the final time step
out1 = out1[range(out1.shape[0]), X3 - 1, :]
out2 = out2[range(out2.shape[0]), X4 - 1, :]
这是假设X3
和X4
是张量。
【讨论】:
啊,我明白了,去试试,非常感谢。 它有效,但你能帮我理解这些是做什么的吗?out1 = out1[range(out1.shape[0]), X3 - 1, :]
'out2 = out2[range(out2.shape[1]), X4 - 1, :]
第一个 range(out1 shape[0])
返回一个包含从 0 到 batch_size 的数字的“列表”。 X_ - 1
是序列的长度,但由于零索引,我们减去了 1
。最后是:
,你可以想到的。因此,首先对于批次中的每个样本(由第一个列表表示)选择第二个列表中相同索引处的数组(这是最后一个输出),然后在最后一个维度中选择它的整个值。
in out_2 不应该也是out2 = out2[range(out2.shape[0]), X4 - 1, :]
吗?
哦,我明白了。是的,它应该是这样的。我想我把它和X
混合在一起,第一个模型在索引0
,第二个在索引1
。以上是关于使用 pack_padded_sequence - pad_packed_sequence 时训练精度降低和损失增加的主要内容,如果未能解决你的问题,请参考以下文章
Pytorch中的RNN之pack_padded_sequence()和pad_packed_sequence()