Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression
Posted slim1017
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression相关的知识,希望对你有一定的参考价值。
教程地址:http://www.deeplearning.net/tutorial/logreg.html#logreg
This sections assumes familiarity with the following Theano concepts: shared variables , basic arithmetic ops , T.grad , floatX. If you intend to run the code on GPU also read GPU.
The code for this section is available for download here.
The Model
ufldl中的softmax教程:http://deeplearning.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92
ufldl中sofymax练习答案:http://blog.csdn.net/u012816943/article/details/50357801
虽然教程写的是Logistic Regression,但是模型就是softmax,Logistic Regression是softmax类别为2类时的特殊情况,这里数字0-9,10类问题。input vector 向10个超平面投影,到某超平面的距离反映了属于某类的概率。
Mathematically, the probability that an input vector is a member of a class, a value of a stochastic variable, can be written as:
The model’s prediction is the class whose probability is maximal, specifically:
# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
# initialize the biases b as a vector of n_out 0s
self.b = theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
)
# symbolic expression for computing the matrix of class-membership
# probabilities
# Where:
# W is a matrix where column-k represent the separation hyperplane for
# class-k
# x is a matrix where row-j represents input training sample-j
# b is a vector where element-k represent the free parameter of
# hyperplane-k
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
# symbolic description of how to compute prediction as class whose
# probability is maximal
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
Note:For a complete list of Theano ops, see: list of ops
Defining a Loss Function
Let us first start by defining the likelihood and loss:
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
y.shape[0]是y的行数,即样本个数n
T.arange(y.shape[0])为[0,1,2,... n-1]
T.log(self.p_y_given_x)为矩阵,n行每行为一个样本,10列每列为一类
最后return 一个10维行向量
Creating a LogisticRegression class
class LogisticRegression(object):这个类代码不贴了
We instantiate this class as follows:
# generate symbolic variables for input (x and y represent a
# minibatch)
x = T.matrix('x') # data, presented as rasterized images
y = T.ivector('y') # labels, presented as 1D vector of [int] labels
# construct the logistic regression class
# Each MNIST image has size 28*28
classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)
注意上面最后一句代码传入的参数为传给LogisticRegression类中的__init__(self, input, n_in, n_out),类似构造函数吧。
定义损失函数:
cost = classifier.negative_log_likelihood(y)
损失函数这里隐含输入x。
Learning the Model
Theano并不需要推导损失函数对参数求导,直接用T.grad函数求得:(我不太清楚,Theano内部到底是怎么求导数的,以后再来研究)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
train_model是一个Theano function,实现了每次梯度下降过程,代码会反复执行此函数,代码如下:
# specify how to update the parameters of the model as a list of
# (variable, update expression) pairs.
updates = [(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)]
# compiling a Theano function `train_model` that returns the cost, but in
# the same time updates the parameter of the model based on the rules
# defined in `updates`
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens=
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
)
很明显updates表示了参数W,b如何更新。updates is a list of pairs,givens is a dictionary
Testing the model
LogisticRegression类中有写了一个求预测错误率的函数:
def errors(self, y):
"""Return a float representing the number of errors in the minibatch
over the total number of examples of the minibatch ; zero one
loss over the size of the minibatch
:type y: theano.tensor.TensorType
:param y: corresponds to a vector that gives for each example the
correct label
"""
# check if y has same dimension of y_pred
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError()
test_model与validate_model,区别只是数据集不同,validate_model在early-stopping中会用到。
# compiling a Theano function that computes the mistakes that are made by
# the model on a minibatch
test_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens=
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]
)
validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens=
x: valid_set_x[index * batch_size: (index + 1) * batch_size],
y: valid_set_y[index * batch_size: (index + 1) * batch_size]
)
后面有两处cPickle save和load的使用:
# save the best model
with open('best_model.pkl', 'w') as f:
cPickle.dump(classifier, f)
# load the saved model
classifier = cPickle.load(open('best_model.pkl'))
以上是关于Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression的主要内容,如果未能解决你的问题,请参考以下文章
Theano-Deep Learning Tutorials 笔记:Stacked Denoising Autoencoders (SdA)
Theano-Deep Learning Tutorials 笔记:Getting Started
Theano-Deep Learning Tutorials 笔记:Multilayer Perceptron
Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression
Theano-Deep Learning Tutorials 笔记:Convolutional Neural Networks (LeNet)
Theano-Deep Learning Tutorials 笔记:Recurrent Neural Networks with Word Embeddings