05 Neural Networks
Posted qq-1615160629
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了05 Neural Networks相关的知识,希望对你有一定的参考价值。
Neural Networks
The ‘one learning algorithm’ hypothesis
- Neuron-rewiring experiments
Model Representation
Define
- Sigmoid(logistic) activation function
- bias unit
- input layer
- output layer
- hidden layer
- \(a_i^(j)\) : ‘activation’ of unit \(i\) in layer \(j\)
- \(\theta^(j)\): matrix of weights controlling function mapping from layer \(j\) to layer \(j + 1\).
Calculate
\[a^(j) = g(z^(j))\]
\[g(x) = \frac11 + e^-x\]
\[z^(j + 1) = \Theta^(j)a^(j)\]
\[h_\theta(x) = a^(j + 1) = g(z^(j + 1))\]
Cost Function
\[
J(\Theta) = - \frac1m \sum_i=1^m \sum_k=1^K \left[y^(i)_k \log ((h_\Theta (x^(i)))_k) + (1 - y^(i)_k)\log (1 - (h_\Theta(x^(i)))_k)\right] + \]
\[\frac\lambda2m\sum_l=1^L-1 \sum_i=1^s_l \sum_j=1^s_l+1 ( \Theta_j,i^(l))^2
\]
Back-propagation Algorithm
Algorithm
- Hypothesis we have calculated all the \(a^(l)\) and \(z^(l)\)
- set \(\Delta^(l)_i, j := 0\) for all (l, i, j)
- using \(y^(t)\), compute \(\delta^L = a^(L) - y^(t)\), where \(y^(t)_k(i) \in 0, 1\) indicates whether the current training example belongs to class k\(y^(t)_k(k) = 1\), or if it belongs to a different class = 0;
- For the hidden layer \(l = L - 1\) down to 2, set
\[
\delta^(l) = (\Theta^(l))^T\delta^(l + 1) .* g’(z^(l))
\] - remember remove \(\delta_0^(l)\) by.
delta(2:end)
\[
\Delta^(l) = \Delta^(l) + \delta^(l + 1)(a^(l))^T
\] - gradient
\[
\frac\partial\partial\Theta^(l)_i,jJ(\Theta) = D^(l)_i,j = \frac1m\Delta^(l)_i,j +
\begincases \frac\lambdam\Theta^(l)_i, j, & \text if j $\geq$ 1 \\ 0, & \textif j = 0 \endcases
\]
Gradient Checking
- \[
\fracdd\ThetaJ(\Theta) \approx \fracJ(\Theta + \epsilon) - J(\Theta - \epsilon)2\epsilon
\] - A small value for \(\epsilon\) such as \(\epsilon = 10^-4\)
- check that gradApprox \(\approx\) deltalVector
4.
epsilon = 1e-4;
for i = 1 : n
thetaPlus = theta;
thetaPlus(i) += epsilon;
thetaMinus = theta;
thetaMinus(i) -= epsilon;
gradApprox(i) = (J(thetaPlus) - J(thetaMinus)) / (2 * epsilon);
end;
Rolling and Unrolling
Random Initialization
Theta = rand(n, m)) * (2 * INIT_EPSILON) - INIT_EPSILON;
- initialize \( \Theta^(l)_ij \in [-\epsilon, \epsilon] \)
- else if we initializing all theta weights to zero, all nodes will update to the same value repeatedly when we back_propagate.
- One effective strategy for choosing \(\epsilon_init\) is to base the number of units in the network. A good choice of \(\epsilon_init\) is \(\epsilon_init = \frac\sqrt6\sqrtL_in + L_out \)
Training a Neural Network
- Randomly initialize weights
Theta = rand(n, m) * (2 * epsilon) - epsilon;
- Implement forward propagation to get \(h_\Theta(x^(i))\) for any \(x^(i)\)
- Implement code to compute cost function \(J(\Theta)\)
Implement back-prop to compute partial derivatives \( \fracd(J\Theta)d\Theta_jk^(l) \)
- \( g’(z) = \fracddzg(z) = g(z)(1 - g(z))\)
- \( sigmoid(z) = g(z) = \frac11 + e^-z\)
Use gradient checking to compare \( \fracd(J\Theta)d\Theta_jk^(l) \) computed using back-propagation vs. using numerical estimate of gradient of \(J(\Theta)\)
Then disable gradient checking codeUse gradient descent or advanced optimization method with back-propagation to try to minimize \(J(\Theta)\) as a function of parameters \(\Theta\)
以上是关于05 Neural Networks的主要内容,如果未能解决你的问题,请参考以下文章
用 Recursive Neural Networks 得到分析树
训练技巧详解含有部分代码Bag of Tricks for Image Classification with Convolutional Neural Networks
Graph Neural Networks for Link Prediction with Subgraph Sketching
Neural networks学习——记录python3下与教材里的python2的不同
课程一,第四周(Deep Neural Networks) —— 2.Programming Assignments: Deep Neural Network - Application