。grad()在pytorch中返回None

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了。grad()在pytorch中返回None相关的知识,希望对你有一定的参考价值。

我正在尝试编写一个简单的脚本进行参数估计(此处的参数为权重)。当.grad()返回None时,我面临问题。我也通过了thisthis链接,并且在理论上和实践上都了解了该概念。对我来说,以下脚本应该可以工作,但不幸的是,它不起作用。

我的第一次尝试:以下脚本是我的第一次尝试

alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)

learning_rate = 1e-4
total_loss = []

for epoch in tqdm(range(500)):
    loss_1 = 0
    for j in range(x_train.size(0)):
        input = x_train[j:j+1]
        target = y_train[j:j+1]
        input = input.to(device,non_blocking=True)
        target = target.to(device,non_blocking=True)
        x_dt = gamma*input[0][0] + 
               alpha_xy*input[0][0]*input[0][2] + 
               alpha1*input[0][0]


        y0_dt = beta_y*input[0][0] + 
                alpha2*input[0][1]

        y_dt = alpha0*input[0][1] + 
               alpha_y*input[0][2] + 
               alpha3*input[0][0]*input[0][2]

        pred = torch.tensor([[x_dt],
                             [y0_dt],
                             [y_dt]],device=device

                                   )
        loss = (pred - target).pow(2).sum()
        loss_1 += loss
        loss.backward()
        print(pred.grad, x_dt.grad, gamma.grad)

以上代码会引发错误消息

element 0 of tensors does not require grad and does not have a grad_fn

在[C0行]

我的尝试2:第一次尝试的改进如下:

loss.backward()

现在脚本可以正常工作,但除预先定义的脚本外,其他两个都返回None。

我想在计算loss.backward()之后更新所有参数并更新它们,但是由于None而没有发生。谁能建议我如何改进此脚本?谢谢。

答案

您通过为gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True) alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True) beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True) alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True) alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True) alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True) alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True) alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True) learning_rate = 1e-4 total_loss = [] for epoch in tqdm(range(500)): loss_1 = 0 for j in range(x_train.size(0)): input = x_train[j:j+1] target = y_train[j:j+1] input = input.to(device,non_blocking=True) target = target.to(device,non_blocking=True) x_dt = gamma*input[0][0] + alpha_xy*input[0][0]*input[0][2] + alpha1*input[0][0] y0_dt = beta_y*input[0][0] + alpha2*input[0][1] y_dt = alpha0*input[0][1] + alpha_y*input[0][2] + alpha3*input[0][0]*input[0][2] pred = torch.tensor([[x_dt], [y0_dt], [y_dt]],device=device, dtype=torch.float, requires_grad=True) loss = (pred - target).pow(2).sum() loss_1 += loss loss.backward() print(pred.grad, x_dt.grad, gamma.grad) # with torch.no_grad(): # gamma -= leraning_rate * gamma.grad 声明新的张量来破坏计算图。相反,您可以使用pred。另外,torch.stackx_dt是非叶子张量,因此默认情况下不会保留渐变。您可以使用pred覆盖此行为。

.retain_grad()

封闭式解决方案

您在这里拥有的是gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True) alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True) beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True) alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True) alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True) alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True) alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True) alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True) learning_rate = 1e-4 total_loss = [] for epoch in tqdm(range(500)): loss_1 = 0 for j in range(x_train.size(0)): input = x_train[j:j+1] target = y_train[j:j+1] input = input.to(device,non_blocking=True) target = target.to(device,non_blocking=True) x_dt = gamma*input[0][0] + alpha_xy*input[0][0]*input[0][2] + alpha1*input[0][0] # retain the gradient for non-leaf tensors x_dt.retain_grad() y0_dt = beta_y*input[0][0] + alpha2*input[0][1] y_dt = alpha0*input[0][1] + alpha_y*input[0][2] + alpha3*input[0][0]*input[0][2] # use stack instead of declaring a new tensor pred = torch.stack([x_dt, y0_dt, y_dt], dim=0).unsqueeze(1) # pred is also a non-leaf tensor so we need to tell pytorch to retain its grad pred.retain_grad() loss = (pred - target).pow(2).sum() loss_1 += loss loss.backward() print(pred.grad, x_dt.grad, gamma.grad) with torch.no_grad(): gamma -= learning_rate * gamma.grad 的示例。看一下,您会注意到ordinary least squaresx_dty0_dt实际上相对于参数彼此独立(即,它们每个都使用一组唯一的参数进行计算)。这使问题变得更加容易,因为这意味着我们实际上可以分别优化术语y_dt(x_dt - target[0])**2(y0_dt - target[1])**2

没有详细介绍解决方案(没有反向传播或梯度下降)最终是

(y_dt - target[2])**2

为了测试此代码是否正常,我生成了一些假数据,这些数据我知道底层的模型系数(它们添加了一些噪声,所以最终结果将与预期的值不完全匹配。]

# supposing x_train is [N,3] and y_train is [N,3]
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = y_train[:, 0].unsqueeze(1)

# avoid inverses using solve to get p1 = inv(x1 . x1^T) . x1 . y1
p1, _ = torch.solve(x1 @ y1, x1 @ x1.transpose(1, 0))

# gamma and alpha1 are redundant. As long as gamma + alpha1 = p1[0] we get the same optimal value for loss
gamma = p1[0] / 2
alpha_xy = p1[1]
alpha1 = p1[0] / 2

x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
y2 = y_train[:, 1].unsqueeze(1)

p2, _ = torch.solve(x2 @ y2, x2 @ x2.transpose(1, 0))

beta_y = p2[0]
alpha2 = p2[1]

x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y3 = y_train[:, 2].unsqueeze(1)

p3, _ = torch.solve(x3 @ y3, x3 @ x3.transpose(1, 0))

alpha0 = p3[0]
alpha_y = p3[1]
alpha3 = p3[2]

loss_1 = torch.sum((x1.transpose(1, 0) @ p1 - y1)**2 + (x2.transpose(1, 0) @ p2 - y2)**2 + (x3.transpose(1, 0) @ p3 - y3)**2)
mse = loss_1 / x_train.size(0)

导致]

def gen_fake_data(samples=50000):
    x_train = torch.randn(samples, 3)
    # define fake data with known minimal solutions
    x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
    x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
    x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
    y1 = x1.transpose(1, 0) @ torch.tensor([[1.0], [2.0]])  # gamma + alpha1 = 1.0
    y2 = x2.transpose(1, 0) @ torch.tensor([[3.0], [4.0]])
    y3 = x3.transpose(1, 0) @ torch.tensor([[5.0], [6.0], [7.0]])
    y_train = torch.cat((y1, y2, y3), dim=1) + 0.1 * torch.randn(samples, 3)
    return x_train, y_train

x_train, y_train = gen_fake_data()

# optimization code from above
...

print('loss_1: ', loss_1.item())
print('MSE:', mse.item())

print('Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0')
print('Actual', gamma.item(), alpha_xy.item(), alpha1.item(), beta_y.item(), alpha2.item(), alpha0.item(), alpha_y.item(), alpha3.item())

以上是关于。grad()在pytorch中返回None的主要内容,如果未能解决你的问题,请参考以下文章

pytorch 梯度计算相关内容总结

pytorch 梯度计算相关内容总结

pytorch 中张量的 Autograd.grad()

pytorch torch.empty()函数(返回填充有未初始化数据的张量。 张量的形状由可变的参数大小定义)

pytorch-torch2:张量计算和连接

使用 pytorch 闪电进行多 GPU 训练时出错