Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression

Posted slim1017

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression相关的知识,希望对你有一定的参考价值。

教程地址:http://www.deeplearning.net/tutorial/logreg.html#logreg

This sections assumes familiarity with the following Theano concepts: shared variables , basic arithmetic ops , T.grad , floatX. If you intend to run the code on GPU also read GPU.

The code for this section is available for download here.

 

The Model

ufldl中的softmax教程http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92

ufldl中sofymax练习答案http://blog.csdn.net/u012816943/article/details/50357801

虽然教程写的是Logistic Regression,但是模型就是softmax,Logistic Regression是softmax类别为2类时的特殊情况,这里数字0-9,10类问题。input vector 向10个超平面投影,到某超平面的距离反映了属于某类的概率

Mathematically, the probability that an input vector is a member of a class, a value of a stochastic variable, can be written as:

The model’s prediction is the class whose probability is maximal, specifically:

 # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
        self.W = theano.shared(
            value=numpy.zeros(
                (n_in, n_out),
                dtype=theano.config.floatX
            ),
            name='W',
            borrow=True
        )
        # initialize the biases b as a vector of n_out 0s
        self.b = theano.shared(
            value=numpy.zeros(
                (n_out,),
                dtype=theano.config.floatX
            ),
            name='b',
            borrow=True
        )

        # symbolic expression for computing the matrix of class-membership
        # probabilities
        # Where:
        # W is a matrix where column-k represent the separation hyperplane for
        # class-k
        # x is a matrix where row-j  represents input training sample-j
        # b is a vector where element-k represent the free parameter of
        # hyperplane-k
        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

        # symbolic description of how to compute prediction as class whose
        # probability is maximal
        self.y_pred = T.argmax(self.p_y_given_x, axis=1)

Note:For a complete list of Theano ops, see: list of ops  

Defining a Loss Function

Let us first start by defining the likelihood and loss:

 

return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

y.shape[0]是y的行数,即样本个数n

T.arange(y.shape[0])为[0,1,2,... n-1]

T.log(self.p_y_given_x)为矩阵,n行每行为一个样本,10列每列为一类

最后return 一个10维行向量

Creating a LogisticRegression class

class LogisticRegression(object):这个类代码不贴了

We instantiate this class as follows:

# generate symbolic variables for input (x and y represent a
    # minibatch)
    x = T.matrix('x')  # data, presented as rasterized images
    y = T.ivector('y')  # labels, presented as 1D vector of [int] labels

    # construct the logistic regression class
    # Each MNIST image has size 28*28
    classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)


注意上面最后一句代码传入的参数为传给LogisticRegression类中的__init__(self, input, n_in, n_out),类似构造函数吧。

定义损失函数:

cost = classifier.negative_log_likelihood(y)

损失函数这里隐含输入x。


Learning the Model

Theano并不需要推导损失函数对参数求导,直接用T.grad函数求得:(我不太清楚,Theano内部到底是怎么求导数的,以后再来研究)

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)

train_model是一个Theano function,实现了每次梯度下降过程,代码会反复执行此函数,代码如下:

# specify how to update the parameters of the model as a list of
    # (variable, update expression) pairs.
    updates = [(classifier.W, classifier.W - learning_rate * g_W),
               (classifier.b, classifier.b - learning_rate * g_b)]

    # compiling a Theano function `train_model` that returns the cost, but in
    # the same time updates the parameter of the model based on the rules
    # defined in `updates`
    train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens=
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        
    )


很明显updates表示了参数W,b如何更新。updates is a list of pairs,givens is a dictionary

Testing the model

LogisticRegression类中有写了一个求预测错误率的函数:

def errors(self, y):
        """Return a float representing the number of errors in the minibatch
        over the total number of examples of the minibatch ; zero one
        loss over the size of the minibatch

        :type y: theano.tensor.TensorType
        :param y: corresponds to a vector that gives for each example the
                  correct label
        """

        # check if y has same dimension of y_pred
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type)
            )
        # check if y is of the correct datatype
        if y.dtype.startswith('int'):
            # the T.neq operator returns a vector of 0s and 1s, where 1
            # represents a mistake in prediction
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()


test_model与validate_model,区别只是数据集不同,validate_model在early-stopping中会用到。

 

 # compiling a Theano function that computes the mistakes that are made by
    # the model on a minibatch
    test_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens=
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]
        
    )

    validate_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens=
            x: valid_set_x[index * batch_size: (index + 1) * batch_size],
            y: valid_set_y[index * batch_size: (index + 1) * batch_size]
        
    )

 

后面有两处cPickle save和load的使用:

# save the best model
                    with open('best_model.pkl', 'w') as f:
                        cPickle.dump(classifier, f)


 

# load the saved model
    classifier = cPickle.load(open('best_model.pkl'))



 

以上是关于Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression的主要内容,如果未能解决你的问题,请参考以下文章

Theano-Deep Learning Tutorials 笔记:Stacked Denoising Autoencoders (SdA)

Theano-Deep Learning Tutorials 笔记:Getting Started

Theano-Deep Learning Tutorials 笔记:Multilayer Perceptron

Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression

Theano-Deep Learning Tutorials 笔记:Convolutional Neural Networks (LeNet)

Theano-Deep Learning Tutorials 笔记:Recurrent Neural Networks with Word Embeddings