神经网络不学习 - MNIST 数据 - 手写识别

Posted 2023-03-12

技术标签:

【中文标题】神经网络不学习 - MNIST 数据 - 手写识别【英文标题】：Neural Network not learning - MNIST data - Handwriting recognition 【发布时间】：2015-04-29 02:12:22 【问题描述】：

我写了一个神经网络程序。它适用于逻辑门，但是当我尝试用它来识别手写数字时 - 它根本无法学习。

请在下面找到代码：

// 这是单个神经元；这可能是理解剩余代码所必需的

typedef struct SingleNeuron

    double                  outputValue;
    std::vector<double>     weight;
    std::vector<double>     deltaWeight;
    double                  gradient;
    double                  sum;
SingleNeuron;

然后我初始化网络。我将权重设置为 -0.5 到 +0.5 之间的随机值，总和为 0，deltaWeight 为 0

然后是前馈：

for (unsigned i = 0; i < inputValues.size(); ++i)

    neuralNet[0][i].outputValue = inputValues[i];
    neuralNet[0][i].sum = 0.0;
    //  std::cout << "o/p Val = " << neuralNet[0][i].outputValue << std::endl;


for (unsigned i = 1; i < neuralNet.size(); ++i)

    std::vector<SingleNeuron> prevLayerNeurons = neuralNet[i - 1];
    unsigned j = 0;
    double thisNeuronOPVal = 0;
    //  std::cout << std::endl;
    for (j = 0; j < neuralNet[i].size() - 1; ++j)
    
        double sum = 0;
        for (unsigned k = 0; k < prevLayerNeurons.size(); ++k)
        
            sum += prevLayerNeurons[k].outputValue * prevLayerNeurons[k].weight[j];
        
        neuralNet[i][j].sum = sum;
        neuralNet[i][j].outputValue = TransferFunction(sum);
        //      std::cout << neuralNet[i][j].outputValue << "\t";
    
    //      std::cout << std::endl;

最后提到了我的传递函数及其导数。

在此之后，我尝试使用以下方法进行反向传播：

// calculate output layer gradients
for (unsigned i = 0; i < outputLayer.size() - 1; ++i)

    double delta = actualOutput[i] - outputLayer[i].outputValue;
    outputLayer[i].gradient = delta * TransferFunctionDerivative(outputLayer[i].sum);

//  std::cout << "Found Output gradients "<< std::endl;
// calculate hidden layer gradients
for (unsigned i = neuralNet.size() - 2; i > 0; --i)

    std::vector<SingleNeuron>& hiddenLayer = neuralNet[i];
    std::vector<SingleNeuron>& nextLayer = neuralNet[i + 1];

    for (unsigned j = 0; j < hiddenLayer.size(); ++j)
    
        double dow = 0.0;
        for (unsigned k = 0; k < nextLayer.size() - 1; ++k)
        
            dow += nextLayer[k].gradient * hiddenLayer[j].weight[k];
        
        hiddenLayer[j].gradient = dow * TransferFunctionDerivative(hiddenLayer[j].sum);
    

//  std::cout << "Found hidden layer gradients "<< std::endl;

// from output to 1st hidden layer, update all weights
for (unsigned i = neuralNet.size() - 1; i > 0; --i)

    std::vector <SingleNeuron>& currentLayer = neuralNet[i];
    std::vector <SingleNeuron>& prevLayer = neuralNet[i - 1];

    for (unsigned j = 0; j < currentLayer.size() - 1; ++j)
    
        for (unsigned k = 0; k < prevLayer.size(); ++k)
        
            SingleNeuron& thisNeueon = prevLayer[k];
            double oldDeltaWeight = thisNeueon.deltaWeight[j];
            double newDeltaWeight = ETA * thisNeueon.outputValue * currentLayer[j].gradient + (ALPHA * oldDeltaWeight);
            thisNeueon.deltaWeight[j] = newDeltaWeight;
            thisNeueon.weight[j] += newDeltaWeight;

这些是 TransferFuntion 及其衍生物；

double TransferFunction(double x)

    double val;
    //val = tanh(x);
    val = 1 / (1 + exp(x * -1));
    return val;


double TransferFunctionDerivative(double x)

    //return 1 - x * x;
    double val = exp(x * -1) / pow((exp(x * -1) + 1), 2);
    return val;

我观察到的一件事如果我使用标准 sigmoid 函数作为我的传递函数并且如果我将神经元的输出传递给传递函数 - 结果是无穷大。但是 tanh(x) 在这个值下工作得很好

因此，如果我使用 1/1+e^(-x) 作为传递函数，我必须传递 Sum of Net Inputs，而 tanh 作为传递函数，我必须传递当前神经元的 output。

我不完全明白为什么会这样，可能这需要一个不同的问题。

但这个问题实际上是关于其他问题：网络适用于逻辑门，但不适用于字符识别

我尝试了Learning Rate 和Acceleration 和# hidden layers 和their sizes 的许多变体/组合。请在下面找到结果：

AvgErr: 0.299399          #Pass799
AvgErr : 0.305071         #Pass809
AvgErr : 0.303046         #Pass819
AvgErr : 0.299569         #Pass829
AvgErr : 0.30413          #Pass839
AvgErr : 0.304165         #Pass849
AvgErr : 0.300529         #Pass859
AvgErr : 0.302973         #Pass869
AvgErr : 0.299238         #Pass879
AvgErr : 0.304708         #Pass889
AvgErr : 0.30068          #Pass899
AvgErr : 0.302582         #Pass909
AvgErr : 0.301767         #Pass919
AvgErr : 0.303167         #Pass929
AvgErr : 0.299551         #Pass939
AvgErr : 0.301295         #Pass949
AvgErr : 0.300651         #Pass959
AvgErr : 0.297867         #Pass969
AvgErr : 0.304221         #Pass979
AvgErr : 0.303702         #Pass989

查看结果后，您可能会觉得这家伙只是陷入了局部最小值，但请等待并通读：

Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]          
Output = 0.0910903, 0.105674, 0.064575, 0.0864824, 0.128682, 0.0878434, 0.0946296, 0.154405, 0.0678767, 0.0666924

Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Output = 0.0916106, 0.105958, 0.0655508, 0.086579, 0.126461, 0.0884082, 0.110953, 0.163343, 0.0689315, 0.0675822

Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]          
Output = 0.105344, 0.105021, 0.0659517, 0.0858077, 0.123104, 0.0884107, 0.116917, 0.161911, 0.0693426, 0.0675156

Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]          
Output = , 0.107113, 0.101838, 0.0641632, 0.0967766, 0.117149, 0.085271, 0.11469, 0.153649, 0.0672772, 0.0652416

以上是 epoch #996、#997、#998 和 #999 的输出

所以单纯的网络不是学习。为此，例如我使用了 ALPHA = 0.4，ETA = 0.7，每个 100 个神经元有 10 个隐藏层，平均超过 10 个 epoch。如果您担心学习率是 0.4 或这么多隐藏层，我已经尝试过它们的变体。例如学习率为 0.1 和 4 个隐藏层 - 每个 16

Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]          
Output = 0.0883238, 0.0983253, 0.0613749, 0.0809751, 0.124972, 0.0897194, 0.0911235, 0.179984, 0.0681346, 0.0660039

Input = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]          
Output = 0.0868767, 0.0966924, 0.0612488, 0.0798343, 0.120353, 0.0882381, 0.111925, 0.169309, 0.0676711, 0.0656819

Input = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]          
Output = 0.105252, 0.0943837, 0.0604416, 0.0781779, 0.116231, 0.0858496, 0.108437, 0.1588, 0.0663156, 0.0645477

Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]          
Output = 0.102023, 0.0914957, 0.059178, 0.09339, 0.111851, 0.0842454, 0.104834, 0.149892, 0.0651799, 0.063558

我非常确定我错过了什么。我无法弄清楚。我已经读过汤姆米切尔的算法很多遍了，但我不知道哪里出了问题。无论我用手解决什么例子 - 都有效！（请不要让我手动解决MNIST数据图像；））我不知道在哪里更改代码，该怎么办..请帮忙..

编辑 -- 根据 cmets 中的建议上传更多数据

1 Hidden Layer of 32 -- 还没有学习。

预期输出——输入是 0-9 之间的图像，所以一个简单的向量描述哪个是当前图像，该位为 1，所有其他位为 0。所以我希望该特定位的输出尽可能接近 1，并且其他接近 0 例如如果输入是Input = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]，我希望输出类似于Output = 0.002023, 0.0914957, 0.059178, 0.09339, 0.011851, 0.0842454, 0.924834, 0.049892, 0.0651799, 0.063558（这是模糊的，手动生成的）

这里是其他研究人员工作的链接。

Stanford

SourceForge -- 这是一个图书馆

不仅这两个，还有很多网站展示演示。

对他们来说一切都很好。如果我像他们一样设置我的网络参数（Alpha、ETA），我不会得到像他们一样的结果，所以这是我的代码有问题的保证。

编辑 2

添加更多失败案例

Accelaration - 0.7, Learning Rate 0.1

Accelaration - 0.7, Learning Rate 0.6

在上述两种情况下，隐藏层都是 3，每层有 32 个神经元。

【问题讨论】：

我没有剖析你的代码，但你的TransferFunctionDerivative 可能会溢出非常大的负输入。最好根据 sigmoid 来定义导数。如果s(x)是sigmoid值，那么ds/dx = s(x)[1 - s(x)]。首先要做的是移除 10 个隐藏层中的 9 个。即使编码正确，深层网络也可能非常不合作。所以请留下 1 个隐藏层，让我们知道会发生什么（1 个隐藏层 NN 至少可以将 MNIST 解决到合理的 93% 准确率）。我解决了这个难题。我犯了最严重的错误。我输入错误。我使用opencv扫描图像，而不是使用reshape，我使用的是resize，所以输入是图像的线性插值。所以我的输入是错误的。代码没有任何问题。我的网络是 784 - 65 - 10giving 96.43% 准确率。对于浪费您的时间，我从心底里道歉。从下一次开始，我会努力解决这些问题。特别感谢丹尼斯！ @Adorn 你应该添加这个作为答案。 @Adorn 如果问题已解决，请添加并接受您自己的答案。 【参考方案1】：

这个答案是从 OP 对该问题的评论中复制而来的。

我解开了谜题。我犯了最严重的错误。我输入错误。我使用opencv扫描图像，而不是使用reshape，我使用的是resize，所以输入是图像的线性插值。所以我的输入是错误的。代码没有任何问题。我的网络是784 - 65 - 10，准确率高达 96.43%。

【讨论】：

以上是关于神经网络不学习 - MNIST 数据 - 手写识别的主要内容，如果未能解决你的问题，请参考以下文章