机器学习- 吴恩达Andrew Ng - week3-2 Logistic Regression Model

Posted 架构师易筋

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了机器学习- 吴恩达Andrew Ng - week3-2 Logistic Regression Model相关的知识,希望对你有一定的参考价值。

Coursera课程地址

因为Coursera的课程还有考试和论坛,后续的笔记是基于Coursera (2021-05-22)
https://www.coursera.org/learn/machine-learning/home/welcome

Logistic Regression Model

1. Cost Function 能耗方程 - Logistic Regression

能耗方程如何选择θ
在这里插入图片描述
分类的能耗方程如果根据线性方程来计算,就会导致没有最低点(non-convex).
在这里插入图片描述
分类回归能耗方程的图形演化过程
在这里插入图片描述

分类回归能耗方程如下(y=1):

Cost((x), y) = -log((x)) if y = 1
Cost = 0 if y = 1,(x) = 1
But as hθ(x) --> 0
       Cost  -->

在这里插入图片描述
分类回归能耗方程如下(y=0):

Cost((x), y) = -log(1 -(x)) if y = 0

在这里插入图片描述

2. Simplified cost function and gradient descent 简化能耗方程和梯度下降算法 - Logistic Regression

分类的能耗方程可以合二为一。
在这里插入图片描述
能耗方程求最优解,也是求J(θ)的最小值
在这里插入图片描述
分类问题的梯度下降方法和线性回归方法是相同的,仅仅是hθ(Xi)不一样。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

2. Advanced optimization 高级优化方法 - Logistic Regression

传统优化方法,是根据梯度下降方法优化 Gradient descent.
在这里插入图片描述
更高级的方法:

  • Conjugate gradient
  • BFGS
  • L-BFGS

优势:

  • 不需要选择梯度下载的速率α
  • 经常快于gradient descent算法

劣势:更复杂
在这里插入图片描述
根据方法fminunc 求解分类问题的解法如下:
在这里插入图片描述
用Octave实现上面👆的步骤
costFunction.m 实现能耗方程costFunction和梯度下降方法gradient

function [jVal, gradient] = costFunction(theta)
jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;

gradient = zeros(2, 1);
gradient(1) = 2 * (theta(1) - 5);
gradient(2) = 2 * (theta(2) - 5);

在Octave中,cd到上面文件的目录,然后执行

>> options = optimset('Gradobj', 'on', 'MaxIter', 100);
>> initialTheta = zeros(2,1)
initialTheta =

   0
   0

>> [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options)
optTheta =

   5.0000
   5.0000

functionVal = 7.8886e-31
exitFlag = 1

fminunc 方法的解释

>> help fminunc
'fminunc' is a function from the file /Applications/Octave-6.2.0.app/Contents/Resources/usr/Cellar/octave-octave-app@6.2.0/6.2.0/share/octave/6.2.0/m/optimization/fminunc.m

 -- fminunc (FCN, X0)
 -- fminunc (FCN, X0, OPTIONS)
 -- [X, FVAL, INFO, OUTPUT, GRAD, HESS] = fminunc (FCN, ...)
     Solve an unconstrained optimization problem defined by the function
     FCN.

     'fminunc' attempts to determine a vector X such that 'FCN (X)' is a
     local minimum.

     FUN is a function handle, inline function, or string containing the
     name of the function to evaluate.  FCN should accept a vector
     (array) defining the unknown variables, and return the objective
     function value, optionally with gradient.

     X0 determines a starting guess.  The shape of X0 is preserved in
     all calls to FCN, but otherwise is treated as a column vector.

     OPTIONS is a structure specifying additional parameters which
     control the algorithm.  Currently, 'fminunc' recognizes these
     options: "AutoScaling", "FinDiffType", "FunValCheck", "GradObj",
     "MaxFunEvals", "MaxIter", "OutputFcn", "TolFun", "TolX",
     "TypicalX".

     If "AutoScaling" is "on", the variables will be automatically
     scaled according to the column norms of the (estimated) Jacobian.
     As a result, "TolFun" becomes scaling-independent.  By default,
     this option is "off" because it may sometimes deliver unexpected
     (though mathematically correct) results.

     If "GradObj" is "on", it specifies that FCN--when called with two
     output arguments--also returns the Jacobian matrix of partial first
     derivatives at the requested point.

     "MaxFunEvals" proscribes the maximum number of function evaluations
     before optimization is halted.  The default value is '100 *
     number_of_variables', i.e., '100 * length (X0)'.  The value must be
     a positive integer.

     "MaxIter" proscribes the maximum number of algorithm iterations
     before optimization is halted.  The default value is 400.  The
     value must be a positive integer.

     "TolX" specifies the termination tolerance for the unknown
     variables X, while "TolFun" is a tolerance for the objective
     function value FVAL.  The default is '1e-6' for both options.

     For a description of the other options, see 'optimset'.

     On return, X is the location of the minimum and FVAL contains the
     value of the objective function at X.

     INFO may be one of the following values:

     1
          Converged to a solution point.  Relative gradient error is
          less than specified by 'TolFun'.

     2
          Last relative step size was less than 'TolX'.

     3
          Last relative change in function value was less than 'TolFun'.

     0
          Iteration limit exceeded--either maximum number of algorithm
          iterations 'MaxIter' or maximum number of function evaluations
          'MaxFunEvals'.

     -1
          Algorithm terminated by 'OutputFcn'.

     -3
          The trust region radius became excessively small.

     Optionally, 'fminunc' can return a structure with convergence
     statistics (OUTPUT), the output gradient (GRAD) at the solution X,
     and approximate Hessian (HESS) at the solution X.

     Application Notes: If the objective function is a single nonlinear
     equation of one variable then using 'fminbnd' is usually a better
     choice.

     The algorithm used by 'fminunc' is a gradient search which depends
     on the objective function being differentiable.  If the function
     has discontinuities it may be better to use a derivative-free
     algorithm such as 'fminsearch'.

     See also: fminbnd, fminsearch, optimset.

Additional help for built-in functions and operators is
available in the online version of the manual.  Use the command
'doc <topic>' to search the manual index.

Help and information about Octave is also available on the WWW
at https://www.octave.org and via the help@octave.org
mailing list.

实现能耗方程CostFunction和梯度下降Gradient descent的公式如下:
在这里插入图片描述

以上是关于机器学习- 吴恩达Andrew Ng - week3-2 Logistic Regression Model的主要内容,如果未能解决你的问题,请参考以下文章

机器学习- 吴恩达Andrew Ng - week3-4 solve overfitting

机器学习- 吴恩达Andrew Ng 编程作业技巧 for Week3

机器学习- 吴恩达Andrew Ng Week1 知识总结 Introduciton

机器学习- 吴恩达Andrew Ng Week11 知识总结 Photo OCR

机器学习- 吴恩达Andrew Ng - week3-3 Multiclass Classification

机器学习- 吴恩达Andrew Ng Week9 知识总结 Recommender Systems