pytorch.optimizer 优化算法

Posted 东东就是我

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pytorch.optimizer 优化算法相关的知识,希望对你有一定的参考价值。

https://zhuanlan.zhihu.com/p/346205754
https://blog.csdn.net/google19890102/article/details/69942970
https://zhuanlan.zhihu.com/p/32626442
https://zhuanlan.zhihu.com/p/32230623

1.优化器optimizer

import torch
import numpy as np
import warnings
warnings.filterwarnings('ignore') #ignore warnings

x = torch.linspace(-np.pi, np.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(1, 1001):
    y_pred = model(xx)
    loss = loss_fn(y_pred, y)
    if t % 100 == 0:
        print('No.: 5d, loss: :.6f'.format(t, loss.item()))
    optimizer.zero_grad() # 梯度清零
    loss.backward() # 反向传播计算梯度
    optimizer.step() # 梯度下降法更新参数

1.1step才是更新参数

    def step(self, closure=None):
        """Performs a single optimization step.

        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.
        """
        loss = None
        if closure is not None:
            with torch.enable_grad():
                loss = closure()

        for group in self.param_groups:
            weight_decay = group['weight_decay']
            momentum = group['momentum']
            dampening = group['dampening']
            nesterov = group['nesterov']

            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad  #获取参数梯度
                if weight_decay != 0:
                    d_p = d_p.add(p, alpha=weight_decay) 
                if momentum != 0:
                    param_state = self.state[p]
                    if 'momentum_buffer' not in param_state:
                        buf = param_state['momentum_buffer'] = torch.clone(d_p).detach()
                    else:
                        buf = param_state['momentum_buffer']
                        buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
                    if nesterov:
                        d_p = d_p.add(buf, alpha=momentum)
                    else:
                        d_p = buf

                p.add_(d_p, alpha=-group['lr']) #更新参数

        return loss

1.0 常见优化器变量定义

模型参数
θ \\theta θ ,
目标函数
J ( θ ) J(\\theta) J(θ)
每一个时刻t(假设是一个batch)的梯度
g t = ▽ θ J ( θ ) g_t=\\bigtriangledown _\\thetaJ(\\theta) gt=θJ(θ)
学习率为
η \\eta η
根据历史梯度的一阶动量
m t = ϕ ( g 1 , g 2 , . . . g t ) m_t=\\phi (g_1,g_2,...g_t) mt=ϕ(g1,g2,...gt)
根据历史梯度的二阶动量
v t = ψ ( g 1 , g 2 , . . . g t ) v_t=\\psi (g_1,g_2,...g_t) vt=ψ(g1,g2,...gt)
更新模型参数
θ t + 1 = θ t − 1 v t + ϵ m t \\theta_t+1=\\theta_t-\\frac1\\sqrtv_t+\\epsilon m_t θt+1=θtvt+ϵ 1mt ,
平滑项,防止分母为0

1.1 SGD

m t = η ∗ g t m_t=\\eta*g_t mt=ηgt
v t = I 2 v_t=I^2 vt=I2

SGD 的缺点在于收敛速度慢,可能在鞍点处震荡。并且,如何合理的选择学习率是 SGD 的一大难点。

1.2 Momentum 动量

m t = γ ∗ m t − 1 + η ∗ g t m_t=\\gamma*m_t-1+\\eta*g_t mt=γmt1+ηgt
也就是按照一定比例的前一次的变化方向和大小,一般比例是0.9

1.3SGD-M

带一阶动量的SGD
m t = γ ∗ m t − 1 + η ∗ g t m_t=\\gamma*m_t-1+\\eta*g_t mt=γmt1+ηgt
v t = I 2 v_t=I^2 vt=I2

1.4Nesterov Accelerated Gradient



也就是在求解梯度的时候把参数改成参数经过动量变换后的参数的梯度
g t = ▽ θ J ( θ − γ ∗ m t − 1 ) g_t=\\bigtriangledown _\\thetaJ(\\theta-\\gamma*m_t-1) gt=θJ(θγmt1)
m t = γ ∗ m t − 1 + η ∗ g t m_t=\\gamma*m_t-1+\\eta*g_t mt=γmt1+ηgt
v t = I 2 v_t=I^2 vt=I2

1.5Adagrad


也就是根据参数改变的频率修改相应参数变化的快慢
v t = ∑ g t 2 v_t=\\sum g_t^2 vt=gt2

1.6RMSprop


v t = γ ∗ v t − 1 + ( 1 − γ ) ∗ g t 2 v_t=\\gamma*v_t-1+(1-\\gamma)*g_t^2 vt=γvt1+以上是关于pytorch.optimizer 优化算法的主要内容,如果未能解决你的问题,请参考以下文章

Android 逆向整体加固脱壳 ( DEX 优化流程分析 | dvmDexFileOpenPartial | dexFileParse | 脱壳点 | 获取 dex 文件在内存中的首地址 )(代码片

博客首页全新改版;博客代码片支持折叠;原创博文新增打赏功能……2022.1.17

布局优化基于粒子群算法实现充电布点优化matlab源码

布局优化基于粒子群算法实现充电布点优化matlab源码

博客首页全新改版;博客代码片支持折叠;原创博文新增打赏功能……2022.1.17

布局优化基于人工鱼群算法实现充电桩布局优化matlab源码