吴恩达机器学习笔记

Posted 2022-11-28 未来可期-2018

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了吴恩达机器学习笔记相关的知识，希望对你有一定的参考价值。

文章目录

Error analysis

Methods to solve over fitting

more training examples
try smaller sets of features
try increasing $\\lambda$

Methods to solve under fitting

getting additional features
try adding polynomial features
try decreasing $\\lambda$

Recommend approach

start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data
plot learning curves to decide if more data, more features, etc. are likely to help.
Error analysis: See if you spot any systematic trend in what type of examples it is making errors on

Error metrics for skewed classes

	1	0
1	True Positive	False Positive
0	False Negative	True Negative

$precision=\\fracTPTP+FP$ 准确率

$recall=\\fracTPTP+FN$ 召回率

$\\displaystyle F_1score=2\\fracPRP+R$ F值

Data for machine learning

In the following conditions, more data makes sense.

Assume feature $x\\in R^n+1$ has sufficient information to predict y accurately.
Use a learning algorithm with many parameters such as logistic regression or linear regression with many features or neural network with many hidden units.

Support Vector Machine

Logistic regression Cost function
$min_\\theta\\frac1m[\\sum_i=1^my^(i)(-\\log h_\\theta(x^(i)))+(1-y^(i))(-\\log(1-h_\\theta(x^(i))))]+\\frac\\lambda2m\\sum_j=1^n\\theta_j^2$
SVM hypothesis
$min_\\theta C\\sum_i=1^m[y^(i)cost_1(\\theta^Tx^(i))+(1-y^(i))cost_0(\\theta^Tx^(i))]+\\frac12\\sum_i=1^n\\theta_j^2$

$h_\\theta(x)= \\begincases 0\\\\ 1\\\\ \\endcases$

SVM parameters

$C=\\frac1\\lambda$

large C low bias, high variance
small C higher bias, low variance

$\\sigma^2$

large $\\sigma^2$ . feature $f_i$ vary more smoothly. higher bias, lower variance
small $\\sigma^2$ . feature $f_i$ vary less smoothly. lower bias, higher variance

Kernel function

no kernel
高斯kernel
$f=e^-\\frac|| x_1-x_2 ||2\\sigma^2$
多项式kernel $k(x,l)=(x^Tl+constant)^degree$

Muti-class classification

train K SVMs, one to distinguish $y = i$ from the rest

Logistic regression vs SVMs

n=number of features $x\\in R^n+1$ ，m=number of training examples

n is large relative to m: use logistic regression or SVM without kernel
n is small , m is intermediate: use SVM with Gaussian kernel
n is small and m is large: create or add more features and then use logistic regression or svm without kernel

K-means

Input