Softmax Regression是逻辑回归在多分类问题上的推广,主要用于处理多分类问题,其中任意两个类别之间都是线性可分的。
假设有$k$个类别,每个类别的参数向量为${\theta}_j $,那么对于每个样本,其所属类别的概率为:
\[P({{y}_{i}}|X,{{\theta }_{j}})=\frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}\]
相比如逻辑回归的损失函数,Softmax的损失函数引入了指示函数,其损失函数为:
$J\left( \theta \right)=-\frac{1}{m}\left[ \sum\limits_{i=1}^{m}{\sum\limits_{j=1}^{k}{I\left\{ {{y}_{i}}=j \right\}\log \frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}}} \right]$
该损失函数的意义是对每一个样本判断其属于哪个类别,并进行相应计算。对该损失函数,可以使用梯度下降法求解,梯度计算过程如下:
${{\nabla }_{{{\theta }_{j}}}}J(\theta )=-\frac{1}{m}\sum\limits_{i=1}^{m}{\left[ {{\nabla }_{{{\theta }_{j}}}}\sum\limits_{l=1}^{k}{I\{{{y}_{i}}=j\}\log \frac{{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}} \right]}$
$ =-\frac{1}{m}\sum\limits_{i=1}^{m}{[I\{{{y}_{i}}=j\}\frac{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}{{{e}^{{{\theta }_{j}}X}}}\cdot \frac{{{e}^{{{\theta }_{j}}X}}\cdot X\cdot \sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}-{{e}^{{{\theta }_{j}}X}}\cdot {{e}^{{{\theta }_{j}}X}}\cdot X}{{{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}^{2}}}]}$
$ =-\frac{1}{m}\sum\limits_{i=1}^{m}{I\{{{y}_{i}}=j\}\frac{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}-{{e}^{{{\theta }_{j}}X}}}{\sum\limits_{l=1}^{k}{{{e}^{{{\theta }_{l}}X}}}}}\cdot X $
$=-\frac{1}{m}\sum\limits_{i=1}^{m}{\left[ (I\{{{y}_{i}}=j\}-P({{y}_{i}}=j||X,{{\theta }_{j}}))\cdot X \right]} $
对每个类别,分别求其${\theta}_j$的梯度并计算,Python代码如下:
# -*- coding: utf-8 -*- """ Created on Sun Jan 28 15:32:44 2018 @author: zhang """ import numpy as np from sklearn.datasets import load_digits from sklearn.cross_validation import train_test_split from sklearn import preprocessing def load_data(): digits = load_digits() data = digits.data label = digits.target return np.mat(data), label def gradient_descent(train_x, train_y, k, maxCycle, alpha): # k 为类别数 numSamples, numFeatures = np.shape(train_x) weights = np.mat(np.ones((numFeatures, k))) for i in range(maxCycle): value = np.exp(train_x * weights) rowsum = value.sum(axis = 1) # 横向求和 rowsum = rowsum.repeat(k, axis = 1) # 横向复制扩展 err = - value / rowsum #计算出每个样本属于每个类别的概率 for j in range(numSamples): err[j, train_y[j]] += 1 weights = weights + (alpha / numSamples) * (train_x.T * err) return weights def test_model(test_x, test_y, weights): results = test_x * weights predict_y = results.argmax(axis = 1) count = 0 for i in range(np.shape(test_y)[0]): if predict_y[i,] == test_y[i,]: count += 1 return count / len(test_y), predict_y if __name__ == "__main__": data, label = load_data() # data = preprocessing.minmax_scale(data, axis = 0) # 数据处理之后识别率降低了 train_x, test_x, train_y, test_y = train_test_split(data, label, test_size = 0.25, random_state=33) k = len(np.unique(label)) weights = gradient_descent(train_x, train_y, k, 800, 0.01) accuracy, predict_y = test_model(test_x, test_y, weights) print("Accuracy:", accuracy)