Trying to backward through the graph a second time, but the buffers have already been freed
Posted MarToony|名角
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Trying to backward through the graph a second time, but the buffers have already been freed相关的知识,希望对你有一定的参考价值。
错误信息:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
尝试解决:
- 第一种:按照提示信息 加上retain_graph。报以下信息:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 5]], which is output 0 of TBackward, is at version 8; expected version 7 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
以上信息我无法找到。
我的一个猜想是 即使找到了,未必一定是本质问题;
- 第二种:你碰到过的最难调试的 Bug 是什么样的? - 索罗格的回答 - 知乎
作者的调试方式是:推到重写;原来自定义的损失函数中有个变量没有变成tensor且未放到GPU中。
受到作者的思路启发,回想自身的代码,确实也存在自定义的损失函数,既然如此也可能是一些变量没有变成tensor的原因。而后,经检查没有异样。
但是检查的时候有一点:我回想到之前将某个tensor变量在GPU化的时候,去掉了其属性data,由此再加上之后,就又可以正常运行了。
centroids = centroid_init(trainloader, encoder, k, d).to(output_device)
# 修改后
centroids = centroid_init(trainloader, encoder, k, d).data.to(output_device)
# 其中centroid_init的内部实现是:
def centroid_init(trainloader, encoder, k, d):
centroid_sums = torch.zeros(k, d).to(output_device)
centroid_counts = torch.zeros(k).to(output_device)
for batch in trainloader:
X_var, y_var = batch["data"].to(output_device), batch["target"].to(output_device)
cluster_assignments = torch.LongTensor(X_var.size(0)).random_(k).to(output_device)
embeddings = encoder(X_var)
update_clusters(centroid_sums, centroid_counts, cluster_assignments, embeddings)
centroid_means = centroid_sums / centroid_counts[:, None]
return centroid_means.clone()
#
其实在print输出tensor对象的时候,加不加data,输出信息都是一样的;其实也确实不太明白一个tensor对象在复制到gpu上时,为什么要如此?—— 求指教
以上是关于Trying to backward through the graph a second time, but the buffers have already been freed的主要内容,如果未能解决你的问题,请参考以下文章
“Unable to locate package” while trying to install packages with APT
“Unable to locate package” while trying to install packages with APT
Fatal error occurred while trying to sysprep the..
emulator: Trying to vcpu execute at eip:6d4053