A review of gradient descent optimization methods
Posted gaoqichao
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了A review of gradient descent optimization methods相关的知识,希望对你有一定的参考价值。
Suppose we are going to optimize a parameterized function (J( heta)), where ( heta in mathbb{R}^d), for example, ( heta) could be a neural net.
More specifically, we want to (mbox{ minimize } J( heta; mathcal{D})) on dataset (mathcal{D}), where each point in (mathcal{D}) is a pair ((x_i, y_i)).
There are different ways to apply gradient descent.
Let (eta) be the learning rate
.
- Vanilla batch update
( heta gets heta - eta abla J( heta; mathcal{D}))
Note that ( abla J( heta; mathcal{D})) computes the gradient on of the whole dataset (mathcal{D}).
```python
for i in range(n_epochs):
gradient = compute_gradient(J, theta, D)
theta = theta - eta * gradient
eta = eta * 0.95
```
It is obvious that when (mathcal{D}) is too large, this approach is unfeasible.
Stochastic Gradient Descent
Stochastic Gradient, on the other hand, update the parameters example by example.
( heta gets heta - eta *J( heta, x_i, y_i)), where ((x_i, y_i) in mathcal{D}).for n in range(n_epochs): for x_i, y_i in D: gradient=compute_gradient(J, theta, x_i, y_i) theta = theta - eta * gradient eta = eta * 0.95
Mini-batch Stochastic Gradient Descent
Update ( heta) example by example could lead to high variance, the alternative approach is to update ( heta) by mini-batches (M) where (|M| ll |mathcal{D}|).for n in range(n_epochs): for M in D: gradient = compute_gradient(J, M) theta = theta - eta * gradient eta = eta * 0.95
Question? Why decaying the learning rate
leads to convergence?
以上是关于A review of gradient descent optimization methods的主要内容,如果未能解决你的问题,请参考以下文章
reviews of learn python3 the hard way
Literature Review: ICRA 2020: Beyond Photometric Consistency: Gradient-based Dissimiliarity for Impr
re正则表达式13_review of regex symbols
Numerical Testing Reportes of A New Conjugate Gradient Projection Method for Convex Constrained Nonl
A Review of Visual Tracking with Deep Learning
A Critical Review of Recurrent Neural Networks for Sequence Learning-论文(综述)阅读笔记