Pytorch:嵌入层后,无法获得 <class 'torch.Tensor'> 的 repr

Posted

技术标签:

【中文标题】Pytorch:嵌入层后,无法获得 <class \'torch.Tensor\'> 的 repr【英文标题】:Pytorch: after embedding layer, Unable to get repr for <class 'torch.Tensor'>Pytorch:嵌入层后,无法获得 <class 'torch.Tensor'> 的 repr 【发布时间】:2020-12-10 14:15:17 【问题描述】:

我是 PyTorch 的新手,正在尝试重现该项目:https://github.com/eXascaleInfolab/ActiveLink

然而,困扰我好几天的feedforward()出现错误,这里是部分代码(模型的完整代码请见https://github.com/eXascaleInfolab/ActiveLink/blob/master/models.py):

def forward(self, e1, rel, batch_size=None, weights=None):
......
        e1_embedded = self.emb_e(e1).view(-1, 1, 10, 20)
        rel_embedded = self.emb_rel(rel).view(-1, 1, 10, 20)
        stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2)  # out: (128L, 1L, 20L, 20L)

这给了我错误(我正在使用 GPU):

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=196 error=710 : device-side assert triggered
Traceback (most recent call last):
  File "main.py", line 147, in <module>
    main()
  File "main.py", line 136, in main
    model = run_meta_incremental(config, model, train_batcher, test_rank_batcher)
  File "/home/yonghui/yt/meta_incr_training.py", line 158, in run_meta_incremental
    g = run_inner(config, model, task)
  File "/home/yonghui/yt/meta_incr_training.py", line 120, in run_inner
    pred = model.forward(e1, rel)
  File "/home/yonghui/yt/models.py", line 136, in forward
    stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:196
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

我使用 Debugger 试图找出问题所在: 在嵌入e1rel 之前,它们都是int64 中的张量,形状为torch.Size([128, 1])

e1 可以正常嵌入,转换为torch.float32torch.Size([128, 1, 10, 20])。但是rel通过emb_rel的embedding层后,Debugger将所有的tenros显示为Unable to get repr for &lt;class 'torch.Tensor'&gt;

发生了什么事?我该如何解决?感谢您提供任何可能的帮助!

【问题讨论】:

【参考方案1】:

该错误在此错误消息被打印之前的某个地方,可能在重塑中。

调用视图不会改变底层数据,它只会改变它们的“视图”并且是懒惰的。如果张量的不同视图是不可能的(例如,因为张量没有连续存储在内存中,请参阅PyTorch forum),在您的情况下,它会在第一次使用张量的内容时失败当你想调试打印张量时。

对于调试,请考虑将view 替换为reshape(参见*** thread on the difference between view and reshape)。

【讨论】:

感谢您提供此类信息:) 但我将view 更改为reshape 后错误没有改变 好的,那我建议你试试你能打印的最后一个张量是多少。这应该是您的错误之前的那一刻。 感谢您的建议,我也尝试将其转换为 CPU 版本,其中调试器信息更直观,表示嵌入层中的索引超出其范围。再次感谢您的帮助。 :)【参考方案2】:

通过使用调试器并检查输入张量来解决此问题。

在embedding前检查张量后,发现有些元素超出了范围,尤其是索引从0开始的情况。

【讨论】:

以上是关于Pytorch:嵌入层后,无法获得 <class 'torch.Tensor'> 的 repr的主要内容,如果未能解决你的问题,请参考以下文章

通俗讲解pytorch中nn.Embedding原理及使用

在 pytorch 的嵌入层中“究竟”发生了啥?

嵌入式学深度学习:1Pytorch框架搭建

如何在 <object> 嵌入上获得 height=100%

NLP Transformers:获得固定句子嵌入向量形状的最佳方法?

使用Pytorch实现VGG的一般版本