第5章 实现多层神经网络BP算法

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了第5章 实现多层神经网络BP算法相关的知识,希望对你有一定的参考价值。

    前言

    神经网络是一种很特别的解决问题的方法。本书将用最简单易懂的方式与读者一起从最简单开始,一步一步深入了解神经网络的基础算法。本书将尽量避开让人望而生畏的名词和数学概念,通过构造可以运行的Java程序来实践相关算法。

 

    关注微信号“javaresearcher"来获取本书的更多信息。

技术分享

 

 

    上一章我们讨论了神经网络的表达能力的数学原理,这一章我们就来实现一个神经网络以及训练算法。

    

    我们今天讨论类似下面的全连接多层单向神经网络:

    

技术分享

    我们把输入也看作一层,上图中一共有三层。输入和输出层是由问题的输入和输出规模决定的,中间层的大小则比较灵活。

 

    下面我们先写出我们神经网络的基本结构。每个神经元一个对象的话未必看容易编写,并且效率更低。因此,整个神经网络我们只需要一个类,只是需要把原来的单个的属性都改造成数组。

public class NeuralNetwork {
int[] shape;
int layers;
double[][][] weights;
double[][] bias;
double[][] zs;
double[][] xs;

    其中:

    - shape数字表示每层网络神经元个数;

    - shape数组的长度就是layers,表示神经网络的层数; 

    - weights[ ][ ][ ]的三个维度分别表示 [层][神经元][对应输入];

    - bias[ ][ ]两个维度表示[层][神经元];

    - zs[ ][ ]数组用来存放每个神经元z=w*x+b的结果;

    -xs[ ][ ]数组涌来存放输入x(第一层)和后面每层的输出s(z)

 

    接下来我们需要初始化上述属性:

 

public NeuralNetwork(int... shape) {
this.shape = shape;
layers = shape.length;
weights = new double[layers][][];
bias = new double[layers][];
//First layer is input layer, no weight
   weights[0] = new double[0][0];
bias[0] = new double[0];
zs = new double[layers][];
xs = new double[layers][];
for (int i = 1; i < layers; i++) {
weights[i] = new double[this.shape[i]][this.shape[i - 1]];
bias[i] = new double[this.shape[i]];
}
    fillRandom(weights);
fillRandom(bias);
}

 

    因为第一层是输入层, 不要计算,所以我们把它的weight和bias设置为空数组。最后我们把w和b的初始值设置为随机数。这是因为如果一开始都是均匀的,所有神经元都是一样的,那就很难在训练中产生差别了,毕竟我们需要每个神经元去接近一个不同的函数。

 

    接下来是我们神经网络的功能函数:

 

double[] f(double[] in) {
zs[0] = xs[0] = in;
for (int i = 1; i < layers; i++) {
zs[i] = add(wx(xs[i - 1], weights[i]), bias[i]);
xs[i] = sigmoid(zs[i]);
}
return xs[layers - 1];
}


double sigmoid(double d) {
return 1.0 / (1.0 + exp(-d));
}

double[] sigmoid(double[] d) {
int length = d.length;
double[] v = new double[length];
for (int i = 0; i < length; i++) {
v[i] = sigmoid(d[i]);
}
return v;
}

double[] wx(double[] x, double[][] weight) {
int numberOfNeron = weight.length;
double[] wx = new double[numberOfNeron];
for (int i = 0; i < numberOfNeron; i++) {
wx[i] = dot(weight[i], x);//SUM(w*x)
   }
return wx;
}

    与前面我们讲过的单个神经元类似,f函数计算sigmoid(w*x+b)。只不过现在所有的变量都是数组,我们需要循环计算对应的的变量。并且循环计算每一层,其结果作为下一层的输入。因为第一层不需要计算,所以从下标1开始循环。最后一层的计算结果就是神经网络的输出。

 

    上边的wx方法计算神经元的w数组与其输入的乘积之和。见下图:

 

技术分享

 

    下面我们开始讨论怎么训练这个神经网络。其方法类似于前面我们讨论过的单个神经元的情况。但是有以下不同:

    1. 每层有多个神经元。我们把每一个神经元分别计算即可。输出层的每个输出都可以与正确答案比对得出偏差。

    2. 有多个层。最后一层我们很容易通过神经网络的输出和训练数据的结果相减来获取cost值,但是前面的层怎么获取cost呢?答案其实很简单,就像求w的导数一样,我们可以求最后一层的输入x的导数。前面我们讲过如何求函数 y=w*x + b 对w和b的导数。类似的,x的导数等于w。我们吧这个x的导数乘cost就是前一层的cost。这就是传说中的反向传播算法(BP, backpropagation)。在开始反向传播之前, 我们要调用f函数让整个网络计算一遍,以此获得最后一层的cost。这个过程叫做向前传播(Feed Forward)。

 


void train(double[] in, double[] expect, double rate) {
double[] y = f(in);
double[] cost = sub(expect, y);
double[][][] dw = new double[layers][][];
double[][] db = new double[layers][];
dw[0] = new double[0][0];
db[0] = new double[0];
for (int i = layers - 1; i > 0; i--) {
double[] sp = signmoidPrime(zs[i]);
cost = mul(cost, sp);
dw[i] = dw(xs[i - 1], cost);
db[i] = cost;
cost = dx(weights[i], cost);
}

weights = add(weights, mul(dw, rate));
bias = add(bias, mul(db, rate));
}

    上面的训练函数基本过程与单个神经元是类似的,请参考前面章节。我们对以下两个方法稍作说明。

    

double[] dx(double[][] w, double[] c) {
int numberOfX = w[0].length;
double[] v = new double[numberOfX];
for (int i = 0; i < numberOfX; i++) {
for (int j = 0; j < c.length; j++) {
v[i] += w[j][i] * c[j];
}
}
return v;
}

    dx方法求中间一层神经元的cost。也就是下一层神经元的w和cost的乘积之和。见下图:

技术分享

 

double[][] dw(double[] x, double[] c) {
int numberOfNeuron = c.length;
int numberOfIn = x.length;
double[][] dw = new double[numberOfNeuron][numberOfIn];
for (int neuron = 0; neuron < numberOfNeuron; neuron++) {
for (int input = 0; input < numberOfIn; input++) {
dw[neuron][input] = c[neuron] * x[input];
}
}
return dw;
}

 

dw是对应多个输入的x*c求和:

技术分享

 

 

    下面是完整的代码,整个神经网络类大约200行代码,并且包含了很多数组的运算。如果采用第三方数学库,则这些矩阵运算就可以省略了。文中我尽量避免使用矩阵、行列式、转置等数学概念,为的是避免这些数学概念带来的不适感。实际上数学都来源于实际应用,在理解概念背后的实际应用之前不使用这些数学概念倒是更容易理解。

 

 

package com.luoxq.ann;

import static java.lang.Math.exp;

public class NeuralNetwork {
int[] shape;
int layers;
double[][][] weights;
double[][] bias;
double[][] zs;
double[][] xs;

public NeuralNetwork(int... shape) {
this.shape = shape;
layers = shape.length;
weights = new double[layers][][];
bias = new double[layers][];
//First layer is input layer, no weight
       weights[0] = new double[0][0];
bias[0] = new double[0];
zs = new double[layers][];
xs = new double[layers][];
for (int i = 1; i < layers; i++) {
weights[i] = new double[this.shape[i]][this.shape[i - 1]];
bias[i] = new double[this.shape[i]];
}
        fillRandom(weights);
fillRandom(bias);
}

Random rand = new Random();

void fillRandom(double[] d) {
for (int i = 0; i < d.length; i++) {
d[i] = rand.nextGaussian();
}
}

void fillRandom(double[][] d) {
for (int i = 0; i < d.length; i++) {
fillRandom(d[i]);
}
}

void fillRandom(double[][][] d) {
for (int i = 0; i < d.length; i++) {
fillRandom(d[i]);
}
    double[] f(double[] in) {
zs[0] = xs[0] = in;
for (int i = 1; i < layers; i++) {
zs[i] = add(wx(xs[i - 1], weights[i]), bias[i]);
xs[i] = sigmoid(zs[i]);
}
return xs[layers - 1];
}


double sigmoid(double d) {
return 1.0 / (1.0 + exp(-d));
}

double[] sigmoid(double[] d) {
int length = d.length;
double[] v = new double[length];
for (int i = 0; i < length; i++) {
v[i] = sigmoid(d[i]);
}
return v;
}


double[] wx(double[] x, double[][] weight) {
int numberOfNeron = weight.length;
double[] wx = new double[numberOfNeron];
for (int i = 0; i < numberOfNeron; i++) {
wx[i] = dot(weight[i], x);//SUM(w*x)
       }
return wx;
}

void train(double[] in, double[] expect, double rate) {
double[] y = f(in);
double[] cost = sub(expect, y);
double[][][] dw = new double[layers][][];
double[][] db = new double[layers][];
dw[0] = new double[0][0];
db[0] = new double[0];
for (int i = layers - 1; i > 0; i--) {
double[] sp = signmoidPrime(zs[i]);
cost = mul(cost, sp);
dw[i] = dw(xs[i - 1], cost);
db[i] = cost;
cost = dx(weights[i], cost);
}

weights = add(weights, mul(dw, rate));
bias = add(bias, mul(db, rate));
}


double[] signmoidPrime(double d[]) {
int length = d.length;
double[] v = new double[length];
for (int i = 0; i < length; i++) {
v[i] = sigmoidPrime(d[i]);
}
return v;
}

double sigmoidPrime(double d) {
return sigmoid(d) * (1 - sigmoid(d));
}

double[] sub(double[] a, double[] b) {
int len = a.length;
double[] v = new double[len];
for (int i = 0; i < len; i++) {
v[i] = a[i] - b[i];
}
return v;
}

//derivative of x is w*c and sum for each x
   double[] dx(double[][] w, double[] c) {
int numberOfX = w[0].length;
double[] v = new double[numberOfX];
for (int i = 0; i < numberOfX; i++) {
for (int j = 0; j < c.length; j++) {
v[i] += w[j][i] * c[j];
}
}
return v;
}

//derivative of w is x*c for each c and each x
   double[][] dw(double[] x, double[] c) {
int numberOfNeuron = c.length;
int numberOfIn = x.length;
double[][] dw = new double[numberOfNeuron][numberOfIn];
for (int neuron = 0; neuron < numberOfNeuron; neuron++) {
for (int input = 0; input < numberOfIn; input++) {
dw[neuron][input] = c[neuron] * x[input];
}
}
return dw;
}

//V[i]*X[i]
   double[] mul(double[] v, double[] x) {
double[] d = new double[v.length];
for (int i = 0; i < v.length; i++) {
d[i] = v[i] * x[i];
}
return d;
}

double[][][] mul(double[][][] a, double b) {
double[][][] v = new double[a.length][][];
for (int i = 0; i < a.length; i++) {
v[i] = mul(a[i], b);
}
return v;
}


double[][] mul(double[][] a, double b) {
double[][] v = new double[a.length][];
for (int i = 0; i < a.length; i++) {
v[i] = mul(a[i], b);
}
return v;
}

double[] mul(double[] a, double b) {
double[] d = new double[a.length];
for (int i = 0; i < a.length; i++) {
d[i] = a[i] * b;
}
return d;
}

double[][][] add(double[][][] a, double[][][] b) {
double[][][] v = new double[a.length][][];
for (int i = 0; i < a.length; i++) {
v[i] = add(a[i], b[i]);
}
return v;
}

double[][] add(double[][] a, double[][] b) {
int length = a.length;
double[][] v = new double[length][];
for (int i = 0; i < length; i++) {
v[i] = add(a[i], b[i]);
}
return v;
}

double[] add(double[] a, double[] b) {
int length = a.length;
double[] v = new double[length];
for (int i = 0; i < length; i++) {
v[i] = a[i] + b[i];
}
return v;
}

double dot(double[] w, double[] x) {
double v = 0;
for (int i = 0; i < w.length; i++) {
v += w[i] * x[i];
}
return v;
}
}

 

 

 

技术分享

以上是关于第5章 实现多层神经网络BP算法的主要内容,如果未能解决你的问题,请参考以下文章

Matlab基于多层前馈网络BP神经网络实现多分类预测(Excel可直接替换数据)

Matlab基于BP神经网络实现多分类预测(源码可直接替换数据)

Matlab基于BP神经网络实现多分类预测(源码可直接替换数据)

神经网络——Python实现BP神经网络算法(理论+例子+程序)

python实现BP算法

多层神经网络BP算法