机器学习- 吴恩达Andrew Ng - week3-2 Logistic Regression Model
Posted 架构师易筋
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了机器学习- 吴恩达Andrew Ng - week3-2 Logistic Regression Model相关的知识,希望对你有一定的参考价值。
Coursera课程地址
因为Coursera的课程还有考试和论坛,后续的笔记是基于Coursera (2021-05-22)
https://www.coursera.org/learn/machine-learning/home/welcome
Logistic Regression Model
1. Cost Function 能耗方程 - Logistic Regression
能耗方程如何选择θ
分类的能耗方程如果根据线性方程来计算,就会导致没有最低点(non-convex).
分类回归能耗方程的图形演化过程
分类回归能耗方程如下(y=1):
Cost(hθ(x), y) = -log(hθ(x)) if y = 1
Cost = 0 if y = 1, hθ(x) = 1
But as hθ(x) --> 0
Cost --> ∞
分类回归能耗方程如下(y=0):
Cost(hθ(x), y) = -log(1 - hθ(x)) if y = 0
2. Simplified cost function and gradient descent 简化能耗方程和梯度下降算法 - Logistic Regression
分类的能耗方程可以合二为一。
能耗方程求最优解,也是求J(θ)的最小值
分类问题的梯度下降方法和线性回归方法是相同的,仅仅是hθ(Xi)不一样。
2. Advanced optimization 高级优化方法 - Logistic Regression
传统优化方法,是根据梯度下降方法优化 Gradient descent.
更高级的方法:
- Conjugate gradient
- BFGS
- L-BFGS
优势:
- 不需要选择梯度下载的速率α
- 经常快于gradient descent算法
劣势:更复杂
根据方法fminunc
求解分类问题的解法如下:
用Octave实现上面👆的步骤
costFunction.m
实现能耗方程costFunction
和梯度下降方法gradient
function [jVal, gradient] = costFunction(theta)
jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;
gradient = zeros(2, 1);
gradient(1) = 2 * (theta(1) - 5);
gradient(2) = 2 * (theta(2) - 5);
在Octave中,cd到上面文件的目录,然后执行
>> options = optimset('Gradobj', 'on', 'MaxIter', 100);
>> initialTheta = zeros(2,1)
initialTheta =
0
0
>> [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options)
optTheta =
5.0000
5.0000
functionVal = 7.8886e-31
exitFlag = 1
fminunc
方法的解释
>> help fminunc
'fminunc' is a function from the file /Applications/Octave-6.2.0.app/Contents/Resources/usr/Cellar/octave-octave-app@6.2.0/6.2.0/share/octave/6.2.0/m/optimization/fminunc.m
-- fminunc (FCN, X0)
-- fminunc (FCN, X0, OPTIONS)
-- [X, FVAL, INFO, OUTPUT, GRAD, HESS] = fminunc (FCN, ...)
Solve an unconstrained optimization problem defined by the function
FCN.
'fminunc' attempts to determine a vector X such that 'FCN (X)' is a
local minimum.
FUN is a function handle, inline function, or string containing the
name of the function to evaluate. FCN should accept a vector
(array) defining the unknown variables, and return the objective
function value, optionally with gradient.
X0 determines a starting guess. The shape of X0 is preserved in
all calls to FCN, but otherwise is treated as a column vector.
OPTIONS is a structure specifying additional parameters which
control the algorithm. Currently, 'fminunc' recognizes these
options: "AutoScaling", "FinDiffType", "FunValCheck", "GradObj",
"MaxFunEvals", "MaxIter", "OutputFcn", "TolFun", "TolX",
"TypicalX".
If "AutoScaling" is "on", the variables will be automatically
scaled according to the column norms of the (estimated) Jacobian.
As a result, "TolFun" becomes scaling-independent. By default,
this option is "off" because it may sometimes deliver unexpected
(though mathematically correct) results.
If "GradObj" is "on", it specifies that FCN--when called with two
output arguments--also returns the Jacobian matrix of partial first
derivatives at the requested point.
"MaxFunEvals" proscribes the maximum number of function evaluations
before optimization is halted. The default value is '100 *
number_of_variables', i.e., '100 * length (X0)'. The value must be
a positive integer.
"MaxIter" proscribes the maximum number of algorithm iterations
before optimization is halted. The default value is 400. The
value must be a positive integer.
"TolX" specifies the termination tolerance for the unknown
variables X, while "TolFun" is a tolerance for the objective
function value FVAL. The default is '1e-6' for both options.
For a description of the other options, see 'optimset'.
On return, X is the location of the minimum and FVAL contains the
value of the objective function at X.
INFO may be one of the following values:
1
Converged to a solution point. Relative gradient error is
less than specified by 'TolFun'.
2
Last relative step size was less than 'TolX'.
3
Last relative change in function value was less than 'TolFun'.
0
Iteration limit exceeded--either maximum number of algorithm
iterations 'MaxIter' or maximum number of function evaluations
'MaxFunEvals'.
-1
Algorithm terminated by 'OutputFcn'.
-3
The trust region radius became excessively small.
Optionally, 'fminunc' can return a structure with convergence
statistics (OUTPUT), the output gradient (GRAD) at the solution X,
and approximate Hessian (HESS) at the solution X.
Application Notes: If the objective function is a single nonlinear
equation of one variable then using 'fminbnd' is usually a better
choice.
The algorithm used by 'fminunc' is a gradient search which depends
on the objective function being differentiable. If the function
has discontinuities it may be better to use a derivative-free
algorithm such as 'fminsearch'.
See also: fminbnd, fminsearch, optimset.
Additional help for built-in functions and operators is
available in the online version of the manual. Use the command
'doc <topic>' to search the manual index.
Help and information about Octave is also available on the WWW
at https://www.octave.org and via the help@octave.org
mailing list.
实现能耗方程CostFunction和梯度下降Gradient descent的公式如下:
以上是关于机器学习- 吴恩达Andrew Ng - week3-2 Logistic Regression Model的主要内容,如果未能解决你的问题,请参考以下文章
机器学习- 吴恩达Andrew Ng - week3-4 solve overfitting
机器学习- 吴恩达Andrew Ng 编程作业技巧 for Week3
机器学习- 吴恩达Andrew Ng Week1 知识总结 Introduciton
机器学习- 吴恩达Andrew Ng Week11 知识总结 Photo OCR