如何在 HuggingFace Transformers GPT-2 中使用过去?

Posted

技术标签:

【中文标题】如何在 HuggingFace Transformers GPT-2 中使用过去?【英文标题】:How to use the past with HuggingFace Transformers GPT-2? 【发布时间】:2020-11-23 17:09:01 【问题描述】:

我有:

        context = torch.tensor(context, dtype=torch.long, device=self.device)
        context = context.unsqueeze(0)
        generated = context
        with torch.no_grad():
            past_outputs = None
            for i in trange(num_words):
                print(i, num_words)
                inputs = "input_ids": generated

                outputs, past_outputs = self.model(
                    **inputs,
                    past=past_outputs
                )
                next_token_logits = outputs[
                    0, -1, :] / (temperature if temperature > 0 else 1.0)

                # reptition penalty from CTRL
                # (https://arxiv.org/abs/1909.05858)
                for _ in set(generated.view(-1).tolist()):
                    next_token_logits[_] /= repetition_penalty

                filtered_logits = top_k_top_p_filtering(
                    next_token_logits, top_k=top_k, top_p=top_p)
                if temperature == 0:  # greedy sampling:
                    next_token = torch.argmax(filtered_logits).unsqueeze(0)
                else:
                    next_token = torch.multinomial(
                        F.softmax(filtered_logits, dim=-1), num_samples=1)

                generated = torch.cat(
                    (generated, next_token.unsqueeze(0)), dim=1)

这适用于第一次迭代,但在下一次迭代时出现错误:

  File "/Users/shamoon/Sites/wordblot/packages/ml-server/generator.py", line 143, in sample_sequence
    past=past_outputs
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 601, in forward
    output_hidden_states=output_hidden_states,
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/transformers/modeling_gpt2.py", line 470, in forward
    position_embeds = self.wpe(position_ids)
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/shamoon/.local/share/virtualenvs/ml-server-EdimT5-E/lib/python3.7/site-packages/torch/nn/functional.py", line 1724, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

是不是我做错了什么?

【问题讨论】:

哪一行导致异常?你能得到更广泛的回溯吗? 什么是modelgeneratedtemperature?这个answer 解释了过去的用法。请发布完整的堆栈跟踪。我假设您超过了 1024 的最大输入长度。 modelgpt2-xlgenerated 在代码中更新。 temperature 是 0.5 能否请您包含完整的堆栈跟踪信息? 'num_words' 的值是多少? context 的初始大小是多少? 你用哪个类来加载你的模型? gpt2lmheadmodel? 【参考方案1】:

我认为问题在于context 包含超过词汇大小的整数值。我的假设是基于最后的回溯行:

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

【讨论】:

@Shamoon 你是什么意思? 我也猜想next_token 可能没有词汇了 如果我没有通过past=past_outputs,那么它可以正常工作。 @Shamoon 你检查了past_outputs 的值吗?【参考方案2】:

我做到了:

                outputs, past_outputs = self.models[model_name](
                    context,
                    past=past_outputs
                )
                context = next_token.unsqueeze(0)

【讨论】:

这样做不会丢失初始上下文吗? 你保留过去,所以没关系。我想?

以上是关于如何在 HuggingFace Transformers GPT-2 中使用过去?的主要内容,如果未能解决你的问题,请参考以下文章

如何下载 HuggingFace 模型“transformers.trainer.Trainer”?

如何在 Huggingface Trainer 课程中恢复训练时避免迭代 Dataloader?

如何在 HuggingFace Transformers GPT-2 中使用过去?

如何在 HuggingFace Transformers 库中获取中间层的预训练 BERT 模型输出?

将 AllenNLP 解释与 HuggingFace 模型一起使用

使用 Huggingface TFTrainer 类微调模型时如何指定损失函数?