(CS229) 第一课 梯度下降及标准方程推导笔记

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了(CS229) 第一课 梯度下降及标准方程推导笔记相关的知识,希望对你有一定的参考价值。

1 regression 和 classificationn

we call the learning problem a regression prob if th target variable that we‘re trying to predict is continuous; when target variable can only take on a small number of discrete values we call it a classification prob.

 

2 gradient descent

技术分享

alpha is learning rate. This update is simultaneously performed for all values of j= 0, ..., n

For a single training exmple, this gives the update rule:

(LMS least mean squares update rule or Widrow-Hoff learining rule)

技术分享

:=  means "set value"

Note that J(theta) is a quadratic function, (a Covex bowl shape), so the gradient descent always converges (only have one global optima with no other local one)

 

3 Batch gd and stochastic gd (gd = gradient descent)

Batch means you should look at every example in the entire training set on every step

But it turns out that sometimes you will meet really a large training set. by then, you should use another algorsm, that is stochastic gd(随机梯度下降) (also called incremental gd) (it may never "converge" to the minimum, and the parameters theta will keep oscillating around the minimum of J(theta))

 

4 Matrix derivatives (define and  some interesting conclusion)

技术分享

技术分享
技术分享

技术分享

有了上面的引进,就有:

技术分享

这也是1/2消失的原因。so to minimize J, we set its derivatives to zero, and obtain the normal equations:

技术分享

Thus, the value of theta that minimizes J(theta) is given in closed form by the equation 

技术分享

 

 

 

 




以上是关于(CS229) 第一课 梯度下降及标准方程推导笔记的主要内容,如果未能解决你的问题,请参考以下文章

机器学习线性回归的损失和优化

神经网络与深度学习笔记逻辑回归与梯度下降

机器学习线性回归优化损失函数

CS229 笔记02

CS229笔记一监督学习,线性回归,LMS算法,正态方程,概率解释和局部加权线性回归

机器学习笔记03:Normal equation与梯度下降的比较