RuntimeError:张量 a (4000) 的大小必须与非单维 1 的张量 b (512) 的大小相匹配



【中文标题】RuntimeError:张量 a (4000) 的大小必须与非单维 1 的张量 b (512) 的大小相匹配【英文标题】:RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1 【发布时间】:2021-03-09 10:06:30 【问题描述】:



bert = AutoModel.from_pretrained('bert-base-uncased')


for epoch in range(epochs):
    print('\n Epoch : / :'.format(epoch + 1, epochs))

    #train model
    train_loss, _ = modhelper.train(proc.train_dataloader)

    #evaluate model
    valid_loss, _ = modhelper.evaluate()

    #save the best model
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss, '')

    # append training and validation loss

    print(f'\nTraining Loss: train_loss:.3f')
    print(f'Validation Loss: valid_loss:.3f')

这是我的 train 方法,可通过对象 modhelper 访问。

def train(self, train_dataloader):
    total_loss, total_accuracy = 0, 0
    # empty list to save model predictions
        # iterate over batches
    for step, batch in enumerate(train_dataloader):
        # progress update after every 50 batches.
        if step % 50 == 0 and not step == 0:
            print('  Batch :>5,  of  :>5,.'.format(step, len(train_dataloader)))
        # push the batch to gpu
        #batch = [ for r in batch]
        sent_id, mask, labels = batch
        # clear previously calculated gradients 

        print(sent_id.size(), mask.size())
        # get model predictions for the current batch
        preds = self.model(sent_id, mask) #This line throws the error
        # compute the loss between actual and predicted values
        self.loss = self.cross_entropy(preds, labels)
        # add on to the total loss
        total_loss = total_loss + self.loss.item()
        # backward pass to calculate the gradients
        # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
        torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
        # update parameters
        # model predictions are stored on GPU. So, push it to CPU
        # append the model predictions
    # compute the training loss of the epoch
    avg_loss = total_loss / len(train_dataloader)
    # predictions are in the form of (no. of batches, size of batch, no. of classes).
    # reshape the predictions in form of (number of samples, no. of classes)
    total_preds  = np.concatenate(total_preds, axis=0)
    #returns the loss and predictions
    return avg_loss, total_preds

preds = self.model(sent_id, mask)这一行抛出以下错误(包括完整的回溯)。

 Epoch 1 / 1
torch.Size([32, 4000]) torch.Size([32, 4000])
Traceback (most recent call last):

File "<ipython-input-39-17211d5a107c>", line 8, in <module>
train_loss, _ = modhelper.train(proc.train_dataloader)

File "E:\BertTorch\", line 71, in train
preds = self.model(sent_id, mask)

File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "E:\BertTorch\", line 181, in forward
#pass the inputs to the model

File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "E:\BertTorch\venv\lib\site-packages\transformers\", line 837, in forward
embedding_output = self.embeddings(

File "E:\BertTorch\venv\lib\site-packages\torch\nn\modules\", line 727, in _call_impl
result = self.forward(*input, **kwargs)

File "E:\BertTorch\venv\lib\site-packages\transformers\", line 201, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings

RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1

如果您发现我在代码中打印了手电筒尺寸。 print(sent_id.size(), mask.size())

那行代码的输出是torch.Size([32, 4000]) torch.Size([32, 4000])




特别是在这一行抛出错误:embeddings = inputs_embeds + position_embeddings + token_type_embeddings。三个变量之间可能存在形状不匹配,因此会出现错误。 @planet_pluto 希望您检查了显示两个 tnsor 大小的行。 torch.Size([32, 4000]) torch.Size([32, 4000]) 你为什么要标记tensorflow? @Venkatesh 我知道self.model() 会引发错误。但是,如果您仔细查看堆栈跟踪,您可以找出在模型的前向传递过程中错误发生的确切位置。 您加载的 bert 经过训练可以处理长度为 512 个元素的序列。您正在提供一个 4000 的序列,而模型告诉您它无法处理。您可以使用不同的模型(如 longformer)或使用滑动窗口方法。这取决于你的任务。 【参考方案1】:

问题在于 BERT 对字数的限制。我已将字数设为 4000,其中支持的最大值为 512(必须在字符串的开头和结尾再放弃 2 个 '[cls]' 和 '[Sep]',所以它只有 510) .减少字数或为您的问题使用其他模型。 @cronoik 在上面的 cmets 中建议的 Longformers 之类的东西。



此问题是否适用于所有 BERT 变体,例如 RoBERTa 和 DeBERTa?如果限制只有 512 个标记,那么这意味着我们会丢失较长文本的信息:|

