如何从罗伯塔变压器中获得正确的嵌入？

Posted 2023-03-29

技术标签:

【中文标题】如何从罗伯塔变压器中获得正确的嵌入？【英文标题】：how to get the correct embedding from Roberta transformers? 【发布时间】：2020-11-29 01:41:14 【问题描述】：

我对应该使用哪个隐藏状态作为微调的 Roberta 变换器模型的输出感到困惑。

from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer
config = AutoConfig.from_pretrained("roberta-base")
config.output_hidden_states = True

tok = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModelForMaskedLM.from_pretrained("roberta-base", config=config)

inp = "alright let s do this  "

sentence = tok.encode(inp, padding='max_length', max_length=512, truncation=True, return_tensors='pt')

output = model(sentence)

根据RobertaForMaskedLM 的 Huggingface 文档：

返回一个元组：

masked_lm_loss（可选） prediction_scores hidden_states（可选）注意事项（可选）

通过传递配置以启用 hidden_states 输出，output 是 (prediction_scores, hidden_states) 的元组

我的问题是：我应该使用output[-1][0] 或output[-1][-1] 作为微调罗伯塔模型的最终输出嵌入吗？我的理解是，output[-1][0] 是输入到 Roberta 模型的初始嵌入，output[-1][-1] 是最终的嵌入输出。

【问题讨论】：

【参考方案1】：

output[-1][-1] 是正确的，如果您正在寻找最后一个编码层的输出。您可以通过查看source code 来解决这个问题，并通过比较输出来验证它：

import torch

print(len(output[-1]))

outputEmbeddings = model.roberta.embeddings(sentence)

#the first tensor is the output of the embedding layer
print(torch.equal(output[1][0],  outputEmbeddings))

#the second tensor is the output of the first encoding layer
print(torch.equal(output[1][1], model.roberta.encoder.layer[0](outputEmbeddings)[0]))

previousLayer = outputEmbeddings
for x in range(12):
    #it is now the current layer
    previousLayer = model.roberta.encoder.layer[x](previousLayer)[0]
    print(torch.equal(output[1][1+x], previousLayer))

输出：

13
True
True
True
True
True
True
True
True
True
True
True
True
True
True

【讨论】：

以上是关于如何从罗伯塔变压器中获得正确的嵌入？的主要内容，如果未能解决你的问题，请参考以下文章