pytorch 1.9.0 backward函数解释以及报错(RuntimeError: grad can be implicitly created only for scalar outputs)

Posted Dontla

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pytorch 1.9.0 backward函数解释以及报错(RuntimeError: grad can be implicitly created only for scalar outputs)相关的知识,希望对你有一定的参考价值。

官方文档

torch._tensor.Tensor 
def backward(self,
	gradient: Optional[Tensor] = None,
	retain_graph: Any = None,
	create_graph: Any = False,
	inputs: Any = None) -> Any
Computes the gradient of current tensor w.r.t. graph leaves.
The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying gradient. It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t. self.
This function accumulates gradients in the leaves - you might need to zero .grad attributes or set them to None before calling it. See Default gradient layouts  for details on the memory layout of accumulated gradients.
Note
If you run any forward ops, create gradient, and/or call backward in a user-specified CUDA stream context, see Stream semantics of backward passes .
计算当前张量w.r.t.图叶的梯度。
使用链规则区分图形。如果张量是非标量(即其数据有多个元素)且需要梯度,则函数还需要指定梯度。它应该是一个匹配类型和位置的张量,包含微分函数w.r.t.self的梯度。
此函数用于在叶子中累积渐变-在调用它之前,可能需要将.grad属性设置为零或将其设置为无。有关累积渐变的内存布局的详细信息,请参见默认渐变布局。
注
如果在用户指定的CUDA流上下文中运行任何向前操作、创建梯度和/或向后调用,请参阅向后过程的流语义。

参数:
gradient – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless ``create_graph`` is True. None values can be specified for scalar Tensors or ones that don't require grad. If a None value would be acceptable then this argument is optional.

梯度–梯度w.r.t.张量。如果它是张量,它将自动转换为不需要梯度的张量,除非“create\\u graph”为True。不能为标量张量或不需要梯度的张量指定任何值。
如果可以接受None值,则此参数是可选的。

retain_graph – If ``False``, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of ``create_graph``.

retain_graph–如果“False”,用于计算梯度的图形将被释放。
请注意,在几乎所有情况下,都不需要将此选项设置为True,而且通常可以以更有效的方式进行处理。
默认值为“创建图形”。

create_graph – If ``True``, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to ``False``.

create_graph–若``True`,将构造导数图,允许计算高阶导数积。
默认值为“False”。

inputs – Inputs w.r.t. which the gradient will be accumulated into ``.grad``. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors.
输入–输入w.r.t.,梯度将累积到“.grad”。
所有其他张量将被忽略。
如果未提供,梯度将累积到用于计算attr::张量的所有叶张量中。
所有提供的输入必须是叶张量。
  < Python 3.8 >

简单示例

示例1

# -*- coding: utf-8 -*-
import torch
x = torch.tensor(2, dtype=torch.float, requires_grad=True)
y = x * x
print(y)
y.backward()
print(x.grad)

y = x * x
print(y)
y.backward()
print(x.grad)

y = x * x
print(y)
y.backward()
print(x.grad)

z = 2*x
print(z)
z.backward()
print(x.grad)

z = 2*x
print(x.grad)

运行结果:

tensor(4., grad_fn=<MulBackward0>)
tensor(4.)
tensor(4., grad_fn=<MulBackward0>)
tensor(8.)
tensor(4., grad_fn=<MulBackward0>)
tensor(12.)
tensor(4., grad_fn=<MulBackward0>)
tensor(14.)
tensor(14.)

进程已结束,退出代码0

可以看出:
1、当x设置requires_grad=True后,后面每一次用对象obj对x操作后,用obj调用backward函数,都会将这次操作的梯度叠加到x的grad属性里
2、对x操作的对象obj可以有很多个obj1、obj2等等,只有某个obj操作后调用backward函数,才能把那次操作的梯度叠加到x的grad属性里,否则不会叠加

示例2(报错(RuntimeError: grad can be implicitly created only for scalar outputs)解决方法)

# -*- coding: utf-8 -*-
import torch
x = torch.tensor([2, 3, 4], dtype=torch.float, requires_grad=True)
y = x * x
print(y)
# y.backward(torch.ones_like(x))
y.backward()
print(x.grad)

运行结果:

D:\\Dontla_miniconda3.8\\python.exe C:/Users/Administrator/Desktop/d2l/torch_code/code/test3.py
tensor([ 4.,  9., 16.], grad_fn=<MulBackward0>)
Traceback (most recent call last):
  File "C:/Users/Administrator/Desktop/d2l/torch_code/code/test3.py", line 7, in <module>
    y.backward()
  File "D:\\Dontla_miniconda3.8\\lib\\site-packages\\torch\\_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\\Dontla_miniconda3.8\\lib\\site-packages\\torch\\autograd\\__init__.py", line 143, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_)
  File "D:\\Dontla_miniconda3.8\\lib\\site-packages\\torch\\autograd\\__init__.py", line 50, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

进程已结束,退出代码1

提示报错:RuntimeError: grad can be implicitly created only for scalar outputs

对于标量输出,梯度只能被隐式创建

就是说使用.backward()的前提是x必须是标量,而不是向量,

如果x是向量,那么,backward函数里就要加参数

像这样:

# -*- coding: utf-8 -*-
import torch
x = torch.tensor([2, 3, 4], dtype=torch.float, requires_grad=True)
y = x * x
print(y)
# y.backward(torch.ones_like(x))
y.backward(torch.ones_like(y))
print(x.grad)

运行结果:

D:\\Dontla_miniconda3.8\\python.exe C:/Users/Administrator/Desktop/d2l/torch_code/code/test3.py
tensor([ 4.,  9., 16.], grad_fn=<MulBackward0>)
tensor([4., 6., 8.])

进程已结束,退出代码0

参考文章:pytorch中backward()函数详解

以上是关于pytorch 1.9.0 backward函数解释以及报错(RuntimeError: grad can be implicitly created only for scalar outputs)的主要内容,如果未能解决你的问题,请参考以下文章

Pytorch学习之梯度计算backward函数

pytorch torch.no_grad()函数(禁用梯度计算)(当确保下文不用backward()函数计算梯度时可以用,用于禁用梯度计算功能,以加快计算速度)

pytorch loss

pytorch loss

Pytorch网络训练流程的作用原理:源码分析optimizer.zero_grad()loss.backward()optimizer.step()

Pytorch网络训练流程的作用原理:源码分析optimizer.zero_grad()loss.backward()optimizer.step()