CS 229 notes Supervised Learning

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CS 229 notes Supervised Learning相关的知识,希望对你有一定的参考价值。

CS 229 notes Supervised Learning

标签(空格分隔): 监督学习 线性代数


Forword

the proof of Normal equation and, before that, some linear algebra equations, which will be used in the proof.

The normal equation

Linear algebra preparation

For two matrices A and B such that AB is square, trAB\\ = \\ trBA.

Proof:

 

 

Some properties:

 

some facts of matrix derivative:
\\nabla_AtrAB=B^T...................................................................(1)

Proof:

 

\\nabla_{A^T}f(A) = (\\nabla_Af(A))^T...........................................................(2)
\\nabla_AtrABA^TC = CAB+C^TAB^T..................................................(3)

Proof 1:

 

Proof 2:

 

\\nabla_A|A| = |A|(A^{-1})^T.............................................................(4)

Proof: (\\nabla_A |A|)_{pq} = C_{pq} = A^*_{qp} = (A^*)^T_{pq} = |A|(A^{-1})_{pq}
(C refers to the cofactor)

Least squares revisited

X = \\begin{bmatrix}-(x^{(1)})^T-\\\\-(x^{(2)})^T-\\\\.\\\\.\\\\.\\\\-(x^{(m)})^T-\\end{bmatrix}(if we don’t include the intercept term)

\\vec y = \\begin{bmatrix}y^{(1)}\\\\y^{(2)}\\\\.\\\\.\\\\.\\\\y^{(m)}\\end{bmatrix}

since h_\\theta(x^{(i)} = (x^{(i)})^T\\theta,

Thus,
$\\frac{1}{2}(X\\theta-\\vec{y})^T(X\\theta-\\vec{y}) =
\\frac{1}{2}\\displaystyle{\\sum{i=1}^{m}(h\\theta(x^{(i)}) -y^{(i)})^2} = J(\\theta) $.

Combine Equations (2),(3)
\\nabla_{A^T}trABA^TC = B^TA^TC^T+BA^TC..............................................(5)

Hence

\\nabla_\\theta J(\\theta) = \\frac{1}{2}\\nabla_\\theta(X\\theta-\\vec{y})^T(X\\theta-\\vec{y})\\\\
 = \\frac{1}{2}\\nabla_\\theta(\\theta^TX^TX\\theta-\\theta^TX^T\\vec{y}-\\vec{y}X\\theta -({\\vec{y}})^T\\vec{y})

Notice it is a real number, or you can see it as a 1\\times 1 matrix, so

 


since trA = trA^T and \\vec y involves no \\theta elements.
then use equation (5) with A^T = \\theta, B = B^T = X^TX, C = I

 


To minmize J, we set its derivative to zero, and obtain the normal equation:
X^TX\\theta = X^T\\vec{y}
\\theta = (X^TX)^{-1}X^T\\vec{y}

以上是关于CS 229 notes Supervised Learning的主要内容,如果未能解决你的问题,请参考以下文章

CS229 笔记03

Stanford CS229 Machine Learning by Andrew Ng

资源 | 源自斯坦福CS229,机器学习备忘录在集结

cs229

CS229 笔记07

CS229 笔记02