RuntimeError:当 `enforce_sorted` 为 True 时,`lengths` 数组必须按降序排序。 - 火炬

Posted

技术标签:

【中文标题】RuntimeError:当 `enforce_sorted` 为 True 时,`lengths` 数组必须按降序排序。 - 火炬【英文标题】:RuntimeError: `lengths` array must be sorted in decreasing order when `enforce_sorted` is True. - Pytorch 【发布时间】:2021-09-03 02:52:38 【问题描述】:

??????????????????坐在这里已经 5 个小时了,同样的错误:

RuntimeError: `lengths` array must be sorted in decreasing order when `enforce_sorted` is True. You can pass `enforce_sorted=False` to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability.

我正在使用 pytorch 中的 RNN 处理这个简单的情感分类任务。我正在使用torchtext 加载我的自定义数据。我从一个 json 文件中加载它,如下所示:

"reviewText": "Da Silva takes the divine by ....", "overall": 4.0, "summary": "An amazing first novel"

我创建了我的field,如下所示。我创建了一个预处理 get_sentiment() 函数,将大于 2 的工作服转换为 1 0 否则:

get_sentiment = lambda x: 1 if x >=3 else 0

TEXT = data.Field(tokenize = 'spacy',
                  tokenizer_language = 'en_core_web_sm',
                  include_lengths=True
                  )
LABEL = data.Field(sequential=False, use_vocab=False, preprocessing=get_sentiment)

fields = 
    'reviewText': ('review', TEXT),
    'overall': ('sentiment', LABEL)

我加载了数据:

train_data, test_data = data.TabularDataset.splits(
    path="/content/",
    train="Books_small_10000.json",
    test="Books_small.json",
    format="json",
    fields=fields
)

我建立了词汇表:

MAX_VOCAB_SIZE = 25_000

TEXT.build_vocab(
    train_data,
    max_size = MAX_VOCAB_SIZE,
    vectors = "glove.6B.100d",
    unk_init = torch.Tensor.normal_
)

LABEL.build_vocab(train_data)

我创建了我的迭代器。

BATCH_SIZE = 64

train_iterator, validation_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, validation_data, test_data),
    device = device,
    batch_size = BATCH_SIZE,
    sort_key = lambda x: len(x.review),
)

这就是我的模型的外观。

class AmazonLSTMRNN(nn.Module):
  def __init__(self, vocab_size, embedding_size, hidden_size, output_size, num_layers
               , bidirectional, dropout, pad_idx):
    super(AmazonLSTMRNN, self).__init__()

    self.embedding = nn.Embedding(vocab_size, embedding_dim=embedding_size, padding_idx=pad_idx)
    self.lstm = nn.LSTM(embedding_size, hidden_size=hidden_size, 
                        bidirectional=bidirectional, num_layers=num_layers,
                        dropout=dropout)
    self.fc = nn.Linear(hidden_size * 2, out_features=output_size)
    self.dropout = nn.Dropout(dropout)

  def forward(self, text, text_lengths):
    embedded = self.dropout(self.embedding(text))
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.to('cpu'))
    packed_output, (h_0, c_0) = self.rnn(packed_embedded)
    output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
    h_0 = self.dropout(torch.cat((h_0[-2,:,:], h_0[-1,:,:]), dim = 1))
    return self.fc(h_0)


INPUT_DIM = len(TEXT.vocab) # # 25002
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token] # 0
amazon_model = AmazonLSTMRNN(INPUT_DIM, 
            EMBEDDING_DIM, 
            HIDDEN_DIM, 
            OUTPUT_DIM, 
            N_LAYERS, 
            BIDIRECTIONAL, 
            DROPOUT, 
            PAD_IDX)

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(amazon_model.parameters())
amazon_model = amazon_model.to(device)
criterion = criterion.to(device)

.....

训练功能

def train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    for batch in iterator:
        optimizer.zero_grad()
        text, text_lengths = batch.review
        predictions = model(text, text_lengths).squeeze(1)
        loss = criterion(predictions, batch.sentiment)
        acc = accuracy(predictions, batch.sentiment)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

训练循环。

N_EPOCHS = 5
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
    start_time = time.time()
    train_loss, train_acc = train(amazon_model, train_iterator, optimizer, criterion)
    end_time = time.time()
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(amazon_model.state_dict(), 'best-model.pt')
    print(f'Epoch: epoch+1:02 | Epoch Time: epoch_minsm epoch_secss')
    print(f'\tTrain Loss: train_loss:.3f | Train Acc: train_acc*100:.2f%')

如果有人知道我哪里错了,请纠正我。任何帮助输入将不胜感激。

??????????????????

【问题讨论】:

【参考方案1】:

几分钟后,我找到了解决方案,并且能够在单个训练时期获得 aprox ~93% 的准确度。

我将LABEL 字段更改为:

LABEL = data.LabelField(preprocessing=get_sentiment, dtype = torch.float)

然后我在转发方法中更改了我的AmazonLSTMRNN 模型,将enforce_sorted=False 添加到pack_padded_sequence

forward 方法:

 def forward(self, text, text_lengths):
    embedded = self.dropout(self.embedding(text))
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.to('cpu'), enforce_sorted=False)
    packed_output, (h_0, c_0) = self.lstm(packed_embedded)
    output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
    h_0 = self.dropout(torch.cat((h_0[-2,:,:], h_0[-1,:,:]), dim = 1))
    return self.fc(h_0)

【讨论】:

以上是关于RuntimeError:当 `enforce_sorted` 为 True 时,`lengths` 数组必须按降序排序。 - 火炬的主要内容,如果未能解决你的问题,请参考以下文章

RuntimeError:无法初始化 API,可能是无效的 tessdata 路径:<>

Discord.py 带线程,RuntimeError: Timeout context manager 应该在任务内部使用

PyTorch:RuntimeError:变量元组的元素0是易失性的

dict遍历的时候删除dict中的值报错RuntimeError: dictionary changed size during iteration

Python线程:“RuntimeError:thread .__ init __()not called”

当 Ruby 中出现第二个异常时,如何让旧的引用生效?