RuntimeError:当 `enforce_sorted` 为 True 时,`lengths` 数组必须按降序排序。 - 火炬
Posted
技术标签:
【中文标题】RuntimeError:当 `enforce_sorted` 为 True 时,`lengths` 数组必须按降序排序。 - 火炬【英文标题】:RuntimeError: `lengths` array must be sorted in decreasing order when `enforce_sorted` is True. - Pytorch 【发布时间】:2021-09-03 02:52:38 【问题描述】:??????????????????坐在这里已经 5 个小时了,同样的错误:
RuntimeError: `lengths` array must be sorted in decreasing order when `enforce_sorted` is True. You can pass `enforce_sorted=False` to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability.
我正在使用 pytorch 中的 RNN
处理这个简单的情感分类任务。我正在使用torchtext
加载我的自定义数据。我从一个 json 文件中加载它,如下所示:
"reviewText": "Da Silva takes the divine by ....", "overall": 4.0, "summary": "An amazing first novel"
我创建了我的field
,如下所示。我创建了一个预处理 get_sentiment()
函数,将大于 2 的工作服转换为 1 0 否则:
get_sentiment = lambda x: 1 if x >=3 else 0
TEXT = data.Field(tokenize = 'spacy',
tokenizer_language = 'en_core_web_sm',
include_lengths=True
)
LABEL = data.Field(sequential=False, use_vocab=False, preprocessing=get_sentiment)
fields =
'reviewText': ('review', TEXT),
'overall': ('sentiment', LABEL)
我加载了数据:
train_data, test_data = data.TabularDataset.splits(
path="/content/",
train="Books_small_10000.json",
test="Books_small.json",
format="json",
fields=fields
)
我建立了词汇表:
MAX_VOCAB_SIZE = 25_000
TEXT.build_vocab(
train_data,
max_size = MAX_VOCAB_SIZE,
vectors = "glove.6B.100d",
unk_init = torch.Tensor.normal_
)
LABEL.build_vocab(train_data)
我创建了我的迭代器。
BATCH_SIZE = 64
train_iterator, validation_iterator, test_iterator = data.BucketIterator.splits(
(train_data, validation_data, test_data),
device = device,
batch_size = BATCH_SIZE,
sort_key = lambda x: len(x.review),
)
这就是我的模型的外观。
class AmazonLSTMRNN(nn.Module):
def __init__(self, vocab_size, embedding_size, hidden_size, output_size, num_layers
, bidirectional, dropout, pad_idx):
super(AmazonLSTMRNN, self).__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim=embedding_size, padding_idx=pad_idx)
self.lstm = nn.LSTM(embedding_size, hidden_size=hidden_size,
bidirectional=bidirectional, num_layers=num_layers,
dropout=dropout)
self.fc = nn.Linear(hidden_size * 2, out_features=output_size)
self.dropout = nn.Dropout(dropout)
def forward(self, text, text_lengths):
embedded = self.dropout(self.embedding(text))
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.to('cpu'))
packed_output, (h_0, c_0) = self.rnn(packed_embedded)
output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
h_0 = self.dropout(torch.cat((h_0[-2,:,:], h_0[-1,:,:]), dim = 1))
return self.fc(h_0)
INPUT_DIM = len(TEXT.vocab) # # 25002
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token] # 0
amazon_model = AmazonLSTMRNN(INPUT_DIM,
EMBEDDING_DIM,
HIDDEN_DIM,
OUTPUT_DIM,
N_LAYERS,
BIDIRECTIONAL,
DROPOUT,
PAD_IDX)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(amazon_model.parameters())
amazon_model = amazon_model.to(device)
criterion = criterion.to(device)
.....
训练功能
def train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
optimizer.zero_grad()
text, text_lengths = batch.review
predictions = model(text, text_lengths).squeeze(1)
loss = criterion(predictions, batch.sentiment)
acc = accuracy(predictions, batch.sentiment)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
训练循环。
N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
start_time = time.time()
train_loss, train_acc = train(amazon_model, train_iterator, optimizer, criterion)
end_time = time.time()
epoch_mins, epoch_secs = epoch_time(start_time, end_time)
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(amazon_model.state_dict(), 'best-model.pt')
print(f'Epoch: epoch+1:02 | Epoch Time: epoch_minsm epoch_secss')
print(f'\tTrain Loss: train_loss:.3f | Train Acc: train_acc*100:.2f%')
如果有人知道我哪里错了,请纠正我。任何帮助输入将不胜感激。
??????????????????
【问题讨论】:
【参考方案1】:几分钟后,我找到了解决方案,并且能够在单个训练时期获得 aprox ~93%
的准确度。
我将LABEL
字段更改为:
LABEL = data.LabelField(preprocessing=get_sentiment, dtype = torch.float)
然后我在转发方法中更改了我的AmazonLSTMRNN
模型,将enforce_sorted=False
添加到pack_padded_sequence
。
forward
方法:
def forward(self, text, text_lengths):
embedded = self.dropout(self.embedding(text))
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.to('cpu'), enforce_sorted=False)
packed_output, (h_0, c_0) = self.lstm(packed_embedded)
output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
h_0 = self.dropout(torch.cat((h_0[-2,:,:], h_0[-1,:,:]), dim = 1))
return self.fc(h_0)
【讨论】:
以上是关于RuntimeError:当 `enforce_sorted` 为 True 时,`lengths` 数组必须按降序排序。 - 火炬的主要内容,如果未能解决你的问题,请参考以下文章
RuntimeError:无法初始化 API,可能是无效的 tessdata 路径:<>
Discord.py 带线程,RuntimeError: Timeout context manager 应该在任务内部使用
PyTorch:RuntimeError:变量元组的元素0是易失性的
dict遍历的时候删除dict中的值报错RuntimeError: dictionary changed size during iteration