CS 229 notes Supervised Learning

Posted 2020-10-15

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了CS 229 notes Supervised Learning相关的知识，希望对你有一定的参考价值。

CS 229 notes Supervised Learning

标签（空格分隔）：监督学习线性代数

Forword

the proof of Normal equation and, before that, some linear algebra equations, which will be used in the proof.

The normal equation

Linear algebra preparation

For two matrices $A$ and $B$ such that $AB$ is square, $trAB\\ = \\ trBA$ .

Proof:

Some properties:

some facts of matrix derivative:
$\\nabla_AtrAB=B^T...................................................................(1)$

Proof:

$\\nabla_{A^T}f(A) = (\\nabla_Af(A))^T...........................................................(2)$
$\\nabla_AtrABA^TC = CAB+C^TAB^T..................................................(3)$

Proof 1:

Proof 2:

$\\nabla_A|A| = |A|(A^{-1})^T.............................................................(4)$

Proof: $(\\nabla_A |A|)_{pq} = C_{pq} = A^*_{qp} = (A^*)^T_{pq} = |A|(A^{-1})_{pq}$
( $C$ refers to the cofactor)

Least squares revisited

$X = \\begin{bmatrix}-(x^{(1)})^T-\\\\-(x^{(2)})^T-\\\\.\\\\.\\\\.\\\\-(x^{(m)})^T-\\end{bmatrix}$ (if we don’t include the intercept term)

$\\vec y = \\begin{bmatrix}y^{(1)}\\\\y^{(2)}\\\\.\\\\.\\\\.\\\\y^{(m)}\\end{bmatrix}$

since $h_\\theta(x^{(i)} = (x^{(i)})^T\\theta$ ,

Thus,
$\\frac{1}{2}(X\\theta-\\vec{y})^T(X\\theta-\\vec{y}) =
\\frac{1}{2}\\displaystyle{\\sum{i=1}^{m}(h\\theta(x^{(i)}) -y^{(i)})^2} = J(\\theta) $.

Combine Equations $(2),(3)$ ：
$\\nabla_{A^T}trABA^TC = B^TA^TC^T+BA^TC..............................................(5)$

Hence

$\\nabla_\\theta J(\\theta) = \\frac{1}{2}\\nabla_\\theta(X\\theta-\\vec{y})^T(X\\theta-\\vec{y})\\\\ = \\frac{1}{2}\\nabla_\\theta(\\theta^TX^TX\\theta-\\theta^TX^T\\vec{y}-\\vec{y}X\\theta -({\\vec{y}})^T\\vec{y})$