吴恩达机器学习笔记

Posted 未来可期-2018

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了吴恩达机器学习笔记相关的知识,希望对你有一定的参考价值。

文章目录

Error analysis

Methods to solve over fitting

  • more training examples
  • try smaller sets of features
  • try increasing λ \\lambda λ

Methods to solve under fitting

  • getting additional features
  • try adding polynomial features
  • try decreasing λ \\lambda λ

Recommend approach

  • start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data
  • plot learning curves to decide if more data, more features, etc. are likely to help.
  • Error analysis: See if you spot any systematic trend in what type of examples it is making errors on

Error metrics for skewed classes

10
1True PositiveFalse Positive
0False NegativeTrue Negative

p r e c i s i o n = T P T P + F P precision=\\fracTPTP+FP precision=TP+FPTP 准确率

r e c a l l = T P T P + F N recall=\\fracTPTP+FN recall=TP+FNTP 召回率

F 1 s c o r e = 2 P R P + R \\displaystyle F_1score=2\\fracPRP+R F1score=2P+RPR F值

Data for machine learning

In the following conditions, more data makes sense.

  • Assume feature x ∈ R n + 1 x\\in R^n+1 xRn+1 has sufficient information to predict y accurately.
  • Use a learning algorithm with many parameters such as logistic regression or linear regression with many features or neural network with many hidden units.

Support Vector Machine

Logistic regression Cost function
m i n θ 1 m [ ∑ i = 1 m y ( i ) ( − log ⁡ h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) ( − log ⁡ ( 1 − h θ ( x ( i ) ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 min_\\theta\\frac1m[\\sum_i=1^my^(i)(-\\log h_\\theta(x^(i)))+(1-y^(i))(-\\log(1-h_\\theta(x^(i))))]+\\frac\\lambda2m\\sum_j=1^n\\theta_j^2 minθm1[i=1my(i)(loghθ(x(i)))+(1y(i))(log(1hθ(x(i))))]+2mλj=1nθj2
SVM hypothesis
m i n θ C ∑ i = 1 m [ y ( i ) c o s t 1 ( θ T x ( i ) ) + ( 1 − y ( i ) ) c o s t 0 ( θ T x ( i ) ) ] + 1 2 ∑ i = 1 n θ j 2 min_\\theta C\\sum_i=1^m[y^(i)cost_1(\\theta^Tx^(i))+(1-y^(i))cost_0(\\theta^Tx^(i))]+\\frac12\\sum_i=1^n\\theta_j^2 minθCi=1m[y(i)cost1(θTx(i))+(1y(i))cost0(θTx(i))]+21i=1nθj2

h θ ( x ) = 0 1 h_\\theta(x)= \\begincases 0\\\\ 1\\\\ \\endcases hθ(x)=01

SVM parameters

C = 1 λ C=\\frac1\\lambda C=λ1

  • large C low bias, high variance
  • small C higher bias, low variance

σ 2 \\sigma^2 σ2

  • large σ 2 \\sigma^2 σ2. feature f i f_i fi vary more smoothly. higher bias, lower variance
  • small σ 2 \\sigma^2 σ2. feature f i f_i fi vary less smoothly. lower bias, higher variance

Kernel function

  • no kernel

  • 高斯kernel
    f = e − ∣ ∣ x 1 − x 2 ∣ ∣ 2 σ 2 f=e^-\\frac|| x_1-x_2 ||2\\sigma^2 f=e2σ2x1x2

  • 多项式kernel k ( x , l ) = ( x T l + c o n s t a n t ) d e g r e e k(x,l)=(x^Tl+constant)^degree k(x,l)=(xTl+constant)degree

Muti-class classification

train K SVMs, one to distinguish y = i y=i y=i from the rest

Logistic regression vs SVMs

n=number of features x ∈ R n + 1 x\\in R^n+1 xRn+1,m=number of training examples

  • n is large relative to m: use logistic regression or SVM without kernel
  • n is small , m is intermediate: use SVM with Gaussian kernel
  • n is small and m is large: create or add more features and then use logistic regression or svm without kernel

K-means

Input