Pytorch 模型 查看网络参数的梯度以及参数更新是否正确,优化器学习率的分层设置
Posted 呆呆象呆呆
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Pytorch 模型 查看网络参数的梯度以及参数更新是否正确,优化器学习率的分层设置相关的知识,希望对你有一定的参考价值。
主要目标
有的时候想看一下设置了优化器和学习率之后是否按照我设置的样子去进行更新,所以想查看一下网络参数中的及各相关变量:
- 更新前的值
- 优化器中的学习率
- 计算出loss之后的梯度值
- 更新后的值
有了这几个值,可能在之后我调试网络结构的过程当中能确保代码没有问题,或者检查中间的结果
实验代码1:查看梯度以及参数更新的问题
import torch
import torch.nn as nn
import numpy as np
import torch.optim as optim
from torchsummary import summary
import os
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from tqdm import tqdm
# 设置一下数据集 数据集的构成是随机两个整数,形成一个加法的效果 input1 + input2 = label
class TrainDataset(Dataset):
def __init__(self):
super(TrainDataset, self).__init__()
self.data = []
for i in range(1,1000):
for j in range(1,1000):
self.data.append([i,j])
def __getitem__(self, index):
input_data = self.data[index]
label = input_data[0] + input_data[1]
return torch.Tensor(input_data),torch.Tensor([label])
def __len__(self):
return len(self.data)
class TestNet(nn.Module):
def __init__(self):
super(TestNet, self).__init__()
self.net1 = nn.Linear(2,1)
def forward(self, x):
x = self.net1(x)
return x
def train():
traindataset = TrainDataset()
traindataloader = DataLoader(dataset = traindataset,batch_size=1,shuffle=False)
testnet = TestNet().cuda()
myloss = nn.MSELoss().cuda()
optimizer = optim.SGD(testnet.parameters(), lr=0.001 )
for epoch in range(100):
for data,label in traindataloader :
print("\\n=====迭代开始=====")
data = data.cuda()
label = label.cuda()
output = testnet(data)
print("输入数据:",data)
print("输出数据:",output)
print("标签:",label)
loss = myloss(output,label)
optimizer.zero_grad()
for name, parms in testnet.named_parameters():
print('-->name:', name)
print('-->para:', parms)
print('-->grad_requirs:',parms.requires_grad)
print('-->grad_value:',parms.grad)
print("===")
loss.backward()
optimizer.step()
print("=============更新之后===========")
for name, parms in testnet.named_parameters():
print('-->name:', name)
print('-->para:', parms)
print('-->grad_requirs:',parms.requires_grad)
print('-->grad_value:',parms.grad)
print("===")
print(optimizer)
input("=====迭代结束=====")
if __name__ == '__main__':
os.environ["CUDA_VISIBLE_DEVICES"] = "{}".format(3)
train()
代码结果1
结果说明
网络前传公式
w
e
i
g
h
t
1
∗
i
n
p
u
t
1
+
w
e
i
g
h
t
2
∗
i
n
p
u
t
2
+
b
i
a
s
=
o
u
t
p
u
t
weight1*input1+weight2*input2+bias = output
weight1∗input1+weight2∗input2+bias=output
MSEloss的计算公式
Loss
=
(
o
u
t
p
u
t
−
l
a
b
e
l
)
2
\\text{Loss} = {(output-label)}^2
Loss=(output−label)2
针对几个参数的偏导数可以用如下的方式计算:
∂
Loss
∂
w
e
i
g
h
t
1
=
∂
Loss
∂
o
u
t
p
u
t
∂
o
u
t
p
u
t
∂
w
e
i
g
h
t
1
=
2
(
o
u
t
p
u
t
−
l
a
b
e
l
)
i
n
p
u
t
1
=
2
(
1.0098
−
2
)
∗
1
=
−
1.9805
\\frac{\\partial\\text{Loss}}{\\partial weight1} = \\frac{\\partial\\text{Loss}}{\\partial output}\\frac{\\partial output}{\\partial weight1} =2(output-label)input1 = 2(1.0098-2)*1=-1.9805
∂weight1∂Loss=∂output∂Loss∂weight1∂output=2(output−label)input1=2(1.0098−2)∗1=−1.9805
∂ Loss ∂ w e i g h t 2 = ∂ Loss ∂ o u t p u t ∂ o u t p u t ∂ w e i g h t 2 = 2 ( o u t p u t − l a b e l ) i n p u t 2 = 2 ( 1.0098 − 2 ) ∗ 1 = − 1.9805 \\frac{\\partial\\text{Loss}}{\\partial weight2} = \\frac{\\partial\\text{Loss}}{\\partial output}\\frac{\\partial output}{\\partial weight2} =2(output-label)input2 = 2(1.0098-2)*1=-1.9805 ∂weight2∂Loss=∂output∂Loss∂weight2∂output=2(output−label)input2=2(1.0098−2)∗1=−1.9805
∂ Loss ∂ b i a s = ∂ Loss ∂ o u t p u t ∂ o u t p u t ∂ b i a s = 2 ( o u t p u t − l a b e l ) = 2 ( 1.0098 − 2 ) = − 1.9805 \\frac{\\partial\\text{Loss}}{\\partial bias} = \\frac{\\partial\\text{Loss}}{\\partial output}\\frac{\\partial output}{\\partial bias} =2(output-label) = 2(1.0098-2)=-1.9805 ∂bias∂Loss=∂output∂Loss∂bias∂output=2(output−label)=2(1.0098−2)=−1.9805
运行一下代码换几个输入输出梯度就不一样了,这里主要是因为输入是两个1所以导致剃度一样。
数据更新
w
e
i
g
h
t
1
n
e
w
=
−
0.0795
−
(
−
1.9805
)
∗
0.001
=
−
0.0776
w
e
i
g
h
t
2
n
e
w
=
0.625
−
(
−
1.9805
)
∗
0.001
=
0.627
b
i
a
s
=
0.4643
−
(
−
1.9805
)
∗
0.001
=
0.4663
\\begin{aligned} weight1_{new} = -0.0795 - (-1.9805)*0.001=-0.0776\\\\ weight2_{new} = 0.625 - (-1.9805)*0.001=0.627\\\\ bias = 0.4643 - (-1.9805)*0.001=0.4663 \\end{aligned}
weight1new=−0.0795−(−1.9805)∗0.001=−0.0776weight2new=0.625−(−1.9805)∗0.001=0.627bias=0.4643−(−1.9805)∗0.001=0.4663
实验代码2:不同优化器的验证(Adam)
将实验1中的优化器换为如下的代码
optimizer = optim.Adam(testnet.parameters(), lr=0.001, betas=(0.9, 0.99))
代码结果2
验证一下梯度loss求导公式没变所以针对梯度我们就不验证了
验证一下Adam的更新
算一下实际用于更新的梯度 以上是关于Pytorch 模型 查看网络参数的梯度以及参数更新是否正确,优化器学习率的分层设置的主要内容,如果未能解决你的问题,请参考以下文章
m
t
=
β
1
∗
m
t
−
1
+
(
1
−
β
1
)
∗
g
t