05 Neural Networks

Posted qq-1615160629

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了05 Neural Networks相关的知识,希望对你有一定的参考价值。

Neural Networks

The ‘one learning algorithm’ hypothesis

  1. Neuron-rewiring experiments

Model Representation

Define
  1. Sigmoid(logistic) activation function
  2. bias unit
  3. input layer
  4. output layer
  5. hidden layer
  6. \(a_i^(j)\) : ‘activation’ of unit \(i\) in layer \(j\)
  7. \(\theta^(j)\): matrix of weights controlling function mapping from layer \(j\) to layer \(j + 1\).
Calculate

\[a^(j) = g(z^(j))\]
\[g(x) = \frac11 + e^-x\]
\[z^(j + 1) = \Theta^(j)a^(j)\]
\[h_\theta(x) = a^(j + 1) = g(z^(j + 1))\]

Cost Function

\[
J(\Theta) = - \frac1m \sum_i=1^m \sum_k=1^K \left[y^(i)_k \log ((h_\Theta (x^(i)))_k) + (1 - y^(i)_k)\log (1 - (h_\Theta(x^(i)))_k)\right] + \]

\[\frac\lambda2m\sum_l=1^L-1 \sum_i=1^s_l \sum_j=1^s_l+1 ( \Theta_j,i^(l))^2
\]

Back-propagation Algorithm
Algorithm
  1. Hypothesis we have calculated all the \(a^(l)\) and \(z^(l)\)
  2. set \(\Delta^(l)_i, j := 0\) for all (l, i, j)
  3. using \(y^(t)\), compute \(\delta^L = a^(L) - y^(t)\), where \(y^(t)_k(i) \in 0, 1\) indicates whether the current training example belongs to class k\(y^(t)_k(k) = 1\), or if it belongs to a different class = 0;
  4. For the hidden layer \(l = L - 1\) down to 2, set
    \[
    \delta^(l) = (\Theta^(l))^T\delta^(l + 1) .* g’(z^(l))
    \]
  5. remember remove \(\delta_0^(l)\) by. delta(2:end)
    \[
    \Delta^(l) = \Delta^(l) + \delta^(l + 1)(a^(l))^T
    \]
  6. gradient
    \[
    \frac\partial\partial\Theta^(l)_i,jJ(\Theta) = D^(l)_i,j = \frac1m\Delta^(l)_i,j +
    \begincases \frac\lambdam\Theta^(l)_i, j, & \text if j $\geq$ 1 \\ 0, & \textif j = 0 \endcases
    \]
Gradient Checking
  1. \[
    \fracdd\ThetaJ(\Theta) \approx \fracJ(\Theta + \epsilon) - J(\Theta - \epsilon)2\epsilon
    \]
  2. A small value for \(\epsilon\) such as \(\epsilon = 10^-4\)
  3. check that gradApprox \(\approx\) deltalVector

4.

epsilon = 1e-4;
for i = 1 : n
    thetaPlus = theta;
    thetaPlus(i) += epsilon;
    thetaMinus = theta;
    thetaMinus(i) -= epsilon;
    gradApprox(i) = (J(thetaPlus) - J(thetaMinus)) / (2 * epsilon);
end;
Rolling and Unrolling
Random Initialization
Theta = rand(n, m)) * (2 * INIT_EPSILON) - INIT_EPSILON;
  1. initialize \( \Theta^(l)_ij \in [-\epsilon, \epsilon] \)
  2. else if we initializing all theta weights to zero, all nodes will update to the same value repeatedly when we back_propagate.
  3. One effective strategy for choosing \(\epsilon_init\) is to base the number of units in the network. A good choice of \(\epsilon_init\) is \(\epsilon_init = \frac\sqrt6\sqrtL_in + L_out \)
Training a Neural Network
  1. Randomly initialize weights
    Theta = rand(n, m) * (2 * epsilon) - epsilon;
  1. Implement forward propagation to get \(h_\Theta(x^(i))\) for any \(x^(i)\)
  2. Implement code to compute cost function \(J(\Theta)\)
  3. Implement back-prop to compute partial derivatives \( \fracd(J\Theta)d\Theta_jk^(l) \)

    • \( g’(z) = \fracddzg(z) = g(z)(1 - g(z))\)
    • \( sigmoid(z) = g(z) = \frac11 + e^-z\)
  4. Use gradient checking to compare \( \fracd(J\Theta)d\Theta_jk^(l) \) computed using back-propagation vs. using numerical estimate of gradient of \(J(\Theta)\)
    Then disable gradient checking code

  5. Use gradient descent or advanced optimization method with back-propagation to try to minimize \(J(\Theta)\) as a function of parameters \(\Theta\)

以上是关于05 Neural Networks的主要内容,如果未能解决你的问题,请参考以下文章

用 Recursive Neural Networks 得到分析树

训练技巧详解含有部分代码Bag of Tricks for Image Classification with Convolutional Neural Networks

Graph Neural Networks for Link Prediction with Subgraph Sketching

探讨COMP9444 Neural Networks

Neural networks学习——记录python3下与教材里的python2的不同

课程一,第四周(Deep Neural Networks) —— 2.Programming Assignments: Deep Neural Network - Application