吴恩达机器学习笔记
Posted 未来可期-2018
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了吴恩达机器学习笔记相关的知识,希望对你有一定的参考价值。
文章目录
Error analysis
Methods to solve over fitting
- more training examples
- try smaller sets of features
- try increasing λ \\lambda λ
Methods to solve under fitting
- getting additional features
- try adding polynomial features
- try decreasing λ \\lambda λ
Recommend approach
- start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data
- plot learning curves to decide if more data, more features, etc. are likely to help.
- Error analysis: See if you spot any systematic trend in what type of examples it is making errors on
Error metrics for skewed classes
1 | 0 | |
---|---|---|
1 | True Positive | False Positive |
0 | False Negative | True Negative |
p r e c i s i o n = T P T P + F P precision=\\fracTPTP+FP precision=TP+FPTP 准确率
r e c a l l = T P T P + F N recall=\\fracTPTP+FN recall=TP+FNTP 召回率
F 1 s c o r e = 2 P R P + R \\displaystyle F_1score=2\\fracPRP+R F1score=2P+RPR F值
Data for machine learning
In the following conditions, more data makes sense.
- Assume feature x ∈ R n + 1 x\\in R^n+1 x∈Rn+1 has sufficient information to predict y accurately.
- Use a learning algorithm with many parameters such as logistic regression or linear regression with many features or neural network with many hidden units.
Support Vector Machine
Logistic regression Cost function
m
i
n
θ
1
m
[
∑
i
=
1
m
y
(
i
)
(
−
log
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
(
−
log
(
1
−
h
θ
(
x
(
i
)
)
)
)
]
+
λ
2
m
∑
j
=
1
n
θ
j
2
min_\\theta\\frac1m[\\sum_i=1^my^(i)(-\\log h_\\theta(x^(i)))+(1-y^(i))(-\\log(1-h_\\theta(x^(i))))]+\\frac\\lambda2m\\sum_j=1^n\\theta_j^2
minθm1[i=1∑my(i)(−loghθ(x(i)))+(1−y(i))(−log(1−hθ(x(i))))]+2mλj=1∑nθj2
SVM
hypothesis
m
i
n
θ
C
∑
i
=
1
m
[
y
(
i
)
c
o
s
t
1
(
θ
T
x
(
i
)
)
+
(
1
−
y
(
i
)
)
c
o
s
t
0
(
θ
T
x
(
i
)
)
]
+
1
2
∑
i
=
1
n
θ
j
2
min_\\theta C\\sum_i=1^m[y^(i)cost_1(\\theta^Tx^(i))+(1-y^(i))cost_0(\\theta^Tx^(i))]+\\frac12\\sum_i=1^n\\theta_j^2
minθCi=1∑m[y(i)cost1(θTx(i))+(1−y(i))cost0(θTx(i))]+21i=1∑nθj2
h θ ( x ) = 0 1 h_\\theta(x)= \\begincases 0\\\\ 1\\\\ \\endcases hθ(x)=01
SVM
parameters
C = 1 λ C=\\frac1\\lambda C=λ1
- large C low bias, high variance
- small C higher bias, low variance
σ 2 \\sigma^2 σ2
- large σ 2 \\sigma^2 σ2. feature f i f_i fi vary more smoothly. higher bias, lower variance
- small σ 2 \\sigma^2 σ2. feature f i f_i fi vary less smoothly. lower bias, higher variance
Kernel function
-
no kernel
-
高斯kernel
f = e − ∣ ∣ x 1 − x 2 ∣ ∣ 2 σ 2 f=e^-\\frac|| x_1-x_2 ||2\\sigma^2 f=e−2σ2∣∣x1−x2∣∣ -
多项式kernel k ( x , l ) = ( x T l + c o n s t a n t ) d e g r e e k(x,l)=(x^Tl+constant)^degree k(x,l)=(xTl+constant)degree
Muti-class classification
train K SVMs
, one to distinguish
y
=
i
y=i
y=i from the rest
Logistic regression vs SVMs
n=number of features x ∈ R n + 1 x\\in R^n+1 x∈Rn+1,m=number of training examples
- n is large relative to m: use logistic regression or
SVM
without kernel - n is small , m is intermediate: use
SVM
with Gaussian kernel - n is small and m is large: create or add more features and then use logistic regression or
svm
without kernel
K-means
Input
- K (number of clusters)
- training set x ( 1 ) , x ( 2 ) , ⋯ , x ( m ) \\x^(1),x^(2),\\cdots,x^(m)\\ x(1),x(2),⋯,吴恩达机器学习笔记