05 Neural Networks

Posted 2021-12-17 qq-1615160629

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了05 Neural Networks相关的知识，希望对你有一定的参考价值。

Neural Networks

The ‘one learning algorithm’ hypothesis

Neuron-rewiring experiments

Model Representation

Define

Sigmoid(logistic) activation function
bias unit
input layer
output layer
hidden layer
$a_i^(j)$ : ‘activation’ of unit $i$ in layer $j$
$\theta^(j)$: matrix of weights controlling function mapping from layer $j$ to layer $j + 1$.

Calculate

\[a^(j) = g(z^(j))\]
\[g(x) = \frac11 + e^-x\]
\[z^(j + 1) = \Theta^(j)a^(j)\]
\[h_\theta(x) = a^(j + 1) = g(z^(j + 1))\]

Cost Function

\[
J(\Theta) = - \frac1m \sum_i=1^m \sum_k=1^K \left[y^(i)_k \log ((h_\Theta (x^(i)))_k) + (1 - y^(i)_k)\log (1 - (h_\Theta(x^(i)))_k)\right] + \]

\[\frac\lambda2m\sum_l=1^L-1 \sum_i=1^s_l \sum_j=1^s_l+1 ( \Theta_j,i^(l))^2
\]

Back-propagation Algorithm

Algorithm

Hypothesis we have calculated all the $a^(l)$ and $z^(l)$
set $\Delta^(l)_i, j := 0$ for all (l, i, j)
using $y^(t)$, compute $\delta^L = a^(L) - y^(t)$, where $y^(t)_k(i) \in 0, 1$ indicates whether the current training example belongs to class k$y^(t)_k(k) = 1$, or if it belongs to a different class = 0;
For the hidden layer $l = L - 1$ down to 2, set
\[
\delta^(l) = (\Theta^(l))^T\delta^(l + 1) .* g’(z^(l))
\]
remember remove $\delta_0^(l)$ by. delta(2:end)
\[
\Delta^(l) = \Delta^(l) + \delta^(l + 1)(a^(l))^T
\]
gradient
\[
\frac\partial\partial\Theta^(l)_i,jJ(\Theta) = D^(l)_i,j = \frac1m\Delta^(l)_i,j +
\begincases \frac\lambdam\Theta^(l)_i, j, & \text if j $\geq$ 1 \\ 0, & \textif j = 0 \endcases
\]

Gradient Checking

\[
\fracdd\ThetaJ(\Theta) \approx \fracJ(\Theta + \epsilon) - J(\Theta - \epsilon)2\epsilon
\]
A small value for $\epsilon$ such as $\epsilon = 10^-4$
check that gradApprox $\approx$ deltalVector

epsilon = 1e-4;
for i = 1 : n
    thetaPlus = theta;
    thetaPlus(i) += epsilon;
    thetaMinus = theta;
    thetaMinus(i) -= epsilon;
    gradApprox(i) = (J(thetaPlus) - J(thetaMinus)) / (2 * epsilon);
end;

Rolling and Unrolling

Random Initialization

Theta = rand(n, m)) * (2 * INIT_EPSILON) - INIT_EPSILON;

initialize $ \Theta^(l)_ij \in [-\epsilon, \epsilon] $
else if we initializing all theta weights to zero, all nodes will update to the same value repeatedly when we back_propagate.
One effective strategy for choosing $\epsilon_init$ is to base the number of units in the network. A good choice of $\epsilon_init$ is $\epsilon_init = \frac\sqrt6\sqrtL_in + L_out $

Training a Neural Network

Randomly initialize weights

    Theta = rand(n, m) * (2 * epsilon) - epsilon;

Implement forward propagation to get $h_\Theta(x^(i))$ for any $x^(i)$
Implement code to compute cost function $J(\Theta)$
Implement back-prop to compute partial derivatives $ \fracd(J\Theta)d\Theta_jk^(l) $
- $ g’(z) = \fracddzg(z) = g(z)(1 - g(z))$
- $ sigmoid(z) = g(z) = \frac11 + e^-z$
Use gradient checking to compare $ \fracd(J\Theta)d\Theta_jk^(l) $ computed using back-propagation vs. using numerical estimate of gradient of $J(\Theta)$
Then disable gradient checking code
Use gradient descent or advanced optimization method with back-propagation to try to minimize $J(\Theta)$ as a function of parameters $\Theta$

以上是关于05 Neural Networks的主要内容，如果未能解决你的问题，请参考以下文章