神经网络优化 - 学习率

Posted gengyi

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了神经网络优化 - 学习率相关的知识,希望对你有一定的参考价值。

1 学习率的基本定义

学习率learning_rate:每次参数更新的幅度。

技术分享图片

简单示例:

假设损失函数 loss = ( w + 1 )2,则梯度为

技术分享图片

参数 w 初始化为 5 ,学习率为 0.2 ,则

运行次数 参数w值 计算
1次 5 5-0.2*(2*5+2) = 2.6
2次 2.6 2.6-0.2*(2*2.6+2) = 1.16
3次 1.16 1.16-0.2*(2*1.16+2) = 0.296
4次 0.296  

 

2 学习率的初步应用

2.1  学习率 0.2 时

# 已知损失函数loss = (w+1)^2,待优化参数W的初值为5
# 求loss最小时对应的W值

# 第一步 引入库,生成数据表
import tensorflow as tf
# 第二步 定义前向传播
# 定义待优化参数w初值赋5
w = tf.Variable(tf.constant(5, dtype=tf.float32))

# 第三步 定义损失函数和反向传播
# 定义损失函数loss
loss = tf.square(w+1)
# 定义反向传播方法,学习率为0.2
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

# 第四步 生成会话,训练40轮
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(40):
        sess.run(train_step)
        w_val = sess.run(w)
        loss_val = sess.run(loss)
        print("After %s steps: w is %f,   loss is %f." % (i, w_val, loss_val))

运行

After 0 steps: w is 2.600000,   loss is 12.959999.
After 1 steps: w is 1.160000,   loss is 4.665599.
After 2 steps: w is 0.296000,   loss is 1.679616.
After 3 steps: w is -0.222400,   loss is 0.604662.
After 4 steps: w is -0.533440,   loss is 0.217678.
After 5 steps: w is -0.720064,   loss is 0.078364.
After 6 steps: w is -0.832038,   loss is 0.028211.
After 7 steps: w is -0.899223,   loss is 0.010156.
After 8 steps: w is -0.939534,   loss is 0.003656.
After 9 steps: w is -0.963720,   loss is 0.001316.
After 10 steps: w is -0.978232,   loss is 0.000474.
After 11 steps: w is -0.986939,   loss is 0.000171.
After 12 steps: w is -0.992164,   loss is 0.000061.
After 13 steps: w is -0.995298,   loss is 0.000022.
After 14 steps: w is -0.997179,   loss is 0.000008.
After 15 steps: w is -0.998307,   loss is 0.000003.
After 16 steps: w is -0.998984,   loss is 0.000001.
After 17 steps: w is -0.999391,   loss is 0.000000.
After 18 steps: w is -0.999634,   loss is 0.000000.
After 19 steps: w is -0.999781,   loss is 0.000000.
After 20 steps: w is -0.999868,   loss is 0.000000.
After 21 steps: w is -0.999921,   loss is 0.000000.
After 22 steps: w is -0.999953,   loss is 0.000000.
After 23 steps: w is -0.999972,   loss is 0.000000.
After 24 steps: w is -0.999983,   loss is 0.000000.
After 25 steps: w is -0.999990,   loss is 0.000000.
After 26 steps: w is -0.999994,   loss is 0.000000.
After 27 steps: w is -0.999996,   loss is 0.000000.
After 28 steps: w is -0.999998,   loss is 0.000000.
After 29 steps: w is -0.999999,   loss is 0.000000.
After 30 steps: w is -0.999999,   loss is 0.000000.
After 31 steps: w is -1.000000,   loss is 0.000000.
After 32 steps: w is -1.000000,   loss is 0.000000.
After 33 steps: w is -1.000000,   loss is 0.000000.
After 34 steps: w is -1.000000,   loss is 0.000000.
After 35 steps: w is -1.000000,   loss is 0.000000.
After 36 steps: w is -1.000000,   loss is 0.000000.
After 37 steps: w is -1.000000,   loss is 0.000000.
After 38 steps: w is -1.000000,   loss is 0.000000.
After 39 steps: w is -1.000000,   loss is 0.000000.

从运算过程可以得出,待优化参数 w 由原来的初赋值参数 5 最终在 after 31 step 优化成 -1,直到 after 39 step 一直保持 -1。

2.2 学习率为 1 时

# 定义反向传播方法,学习率为 1 
train_step = tf.train.GradientDescentOptimizer(1).minimize(loss)

运行

After 0 steps: w is -7.000000,   loss is 36.000000.
After 1 steps: w is 5.000000,   loss is 36.000000.
After 2 steps: w is -7.000000,   loss is 36.000000.
After 3 steps: w is 5.000000,   loss is 36.000000.
After 4 steps: w is -7.000000,   loss is 36.000000.
After 5 steps: w is 5.000000,   loss is 36.000000.
After 6 steps: w is -7.000000,   loss is 36.000000.
After 7 steps: w is 5.000000,   loss is 36.000000.
After 8 steps: w is -7.000000,   loss is 36.000000.
After 9 steps: w is 5.000000,   loss is 36.000000.
After 10 steps: w is -7.000000,   loss is 36.000000.
After 11 steps: w is 5.000000,   loss is 36.000000.
After 12 steps: w is -7.000000,   loss is 36.000000.
After 13 steps: w is 5.000000,   loss is 36.000000.
After 14 steps: w is -7.000000,   loss is 36.000000.
After 15 steps: w is 5.000000,   loss is 36.000000.
After 16 steps: w is -7.000000,   loss is 36.000000.
After 17 steps: w is 5.000000,   loss is 36.000000.
After 18 steps: w is -7.000000,   loss is 36.000000.
After 19 steps: w is 5.000000,   loss is 36.000000.
After 20 steps: w is -7.000000,   loss is 36.000000.
After 21 steps: w is 5.000000,   loss is 36.000000.
After 22 steps: w is -7.000000,   loss is 36.000000.
After 23 steps: w is 5.000000,   loss is 36.000000.
After 24 steps: w is -7.000000,   loss is 36.000000.
After 25 steps: w is 5.000000,   loss is 36.000000.
After 26 steps: w is -7.000000,   loss is 36.000000.
After 27 steps: w is 5.000000,   loss is 36.000000.
After 28 steps: w is -7.000000,   loss is 36.000000.
After 29 steps: w is 5.000000,   loss is 36.000000.
After 30 steps: w is -7.000000,   loss is 36.000000.
After 31 steps: w is 5.000000,   loss is 36.000000.
After 32 steps: w is -7.000000,   loss is 36.000000.
After 33 steps: w is 5.000000,   loss is 36.000000.
After 34 steps: w is -7.000000,   loss is 36.000000.
After 35 steps: w is 5.000000,   loss is 36.000000.
After 36 steps: w is -7.000000,   loss is 36.000000.
After 37 steps: w is 5.000000,   loss is 36.000000.
After 38 steps: w is -7.000000,   loss is 36.000000.
After 39 steps: w is 5.000000,   loss is 36.000000.

当学习率过大时,结果并不能收敛,只是在来回震荡,而实际上,有的值还越跳越大。

学习率过大,震荡不收敛。

 

2.3 学习率为 0.001 时

# 定义反向传播方法,学习率为 1 
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

运行

After 0 steps: w is 4.988000,   loss is 35.856144.
After 1 steps: w is 4.976024,   loss is 35.712864.
After 2 steps: w is 4.964072,   loss is 35.570156.
After 3 steps: w is 4.952144,   loss is 35.428020.
After 4 steps: w is 4.940240,   loss is 35.286449.
After 5 steps: w is 4.928360,   loss is 35.145447.
After 6 steps: w is 4.916503,   loss is 35.005009.
After 7 steps: w is 4.904670,   loss is 34.865124.
After 8 steps: w is 4.892860,   loss is 34.725803.
After 9 steps: w is 4.881075,   loss is 34.587044.
After 10 steps: w is 4.869313,   loss is 34.448833.
After 11 steps: w is 4.857574,   loss is 34.311172.
After 12 steps: w is 4.845859,   loss is 34.174068.
After 13 steps: w is 4.834167,   loss is 34.037510.
After 14 steps: w is 4.822499,   loss is 33.901497.
After 15 steps: w is 4.810854,   loss is 33.766029.
After 16 steps: w is 4.799233,   loss is 33.631104.
After 17 steps: w is 4.787634,   loss is 33.496712.
After 18 steps: w is 4.776059,   loss is 33.362858.
After 19 steps: w is 4.764507,   loss is 33.229538.
After 20 steps: w is 4.752978,   loss is 33.096756.
After 21 steps: w is 4.741472,   loss is 32.964497.
After 22 steps: w is 4.729989,   loss is 32.832775.
After 23 steps: w is 4.718529,   loss is 32.701576.
After 24 steps: w is 4.707092,   loss is 32.570904.
After 25 steps: w is 4.695678,   loss is 32.440750.
After 26 steps: w is 4.684287,   loss is 32.311119.
After 27 steps: w is 4.672918,   loss is 32.182003.
After 28 steps: w is 4.661572,   loss is 32.053402.
After 29 steps: w is 4.650249,   loss is 31.925320.
After 30 steps: w is 4.638949,   loss is 31.797745.
After 31 steps: w is 4.627671,   loss is 31.670683.
After 32 steps: w is 4.616416,   loss is 31.544128.
After 33 steps: w is 4.605183,   loss is 31.418077.
After 34 steps: w is 4.593973,   loss is 31.292530.
After 35 steps: w is 4.582785,   loss is 31.167484.
After 36 steps: w is 4.571619,   loss is 31.042938.
After 37 steps: w is 4.560476,   loss is 30.918892.
After 38 steps: w is 4.549355,   loss is 30.795341.
After 39 steps: w is 4.538256,   loss is 30.672281.

在学习率调整至 0.001 后,w值也收敛,只是变化太慢,在 after 39 step 后才从 5 --> 4.5 ,从第一例可知,最终的优化结果为 -1 , 很显然,当学习率过低时,运行效率非常低。

 学习率过小时,收敛速度太慢。

3 指数衰减学习率

3.1 指数衰减学习率数学释义

指数衰减学习率是根据运行的轮数动态调整学习率。

具体多少轮更新一次学习率RATE_STEP = 总样本数 / BATCH_SIZE

将总样本数量分割成 N 个 BATCH_SIZE 参数喂入神经网络训练。每喂入 BATCH_SIZE 数据量循环一轮,当数据集的子集均已喂入神经网络后,调整一次学习率。

一个神经网络训练多少次是设定好的,记为global_step

学习率更新次数 = global_step / RATE_STEP

技术分享图片

 

备注:

learning_rate - 学习率更新后的值

RATE_BAST - 学习率基数,学习率初始值

RATE_DECAY - 学习率衰减率,一般值范围( 0, 1 )

global_step - 运行总轮次数

rate_step - 多少轮更新一次学习率

3.2 指数衰减学习率 Tensorflow 代码

global_step= tf.Variable(0, trainable = False)

由于这个变量只为计数,trainable = False 意味该数据不可训练

 

以上是关于神经网络优化 - 学习率的主要内容,如果未能解决你的问题,请参考以下文章

神经网络优化 - 学习率

tensorflow:实战Google深度学习框架第四章02神经网络优化(学习率,避免过拟合,滑动平均模型)

「深度学习一遍过」必修11:优化器的高级使用+学习率迭代策略+分类优化目标定义

DL:深度学习模型优化之模型训练技巧总结之适时自动调整学习率实现代码

神经网络超参数选择

10的三次方怎么稀释梯度