MachineLearningOnCoursera
Posted ab229693
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MachineLearningOnCoursera相关的知识,希望对你有一定的参考价值。
Week Six
F Score
[egin{aligned} P &= &dfrac{2}{dfrac{1}{P}+dfrac{1}{R}} &= &2 dfrac{PR}{P+R} end{aligned}]
Week Seven
Support Vector Machine
Cost Function
[egin{aligned}
&min_{ heta}lbrack-dfrac{1}{m}{sum_{y_{i}in Y, x_{i} in X}{y_{i} log h( heta^{T}x_{i})}+(1-y_{i})log (1-h( heta^{T}x_{i}))+dfrac{lambda}{2m} sum_{ heta_{i} in heta}{ heta_{i}^{2}}}
brack &Rightarrow min_{ heta}[-sum_{y_{i} in Y,x_{i} in X}{y_{i} log{h( heta^{T}x_{i})}+(1-y_{i})log(1-h( heta^{T}x_{i}}))+dfrac{lambda}{2}sum_{ heta_{i} in heta }{ heta^2_{i}}] &Rightarrowmin_{ heta}[Csum_{y_{i} in Y,x_{i} in X}{y_{i} log{h( heta^{T}x_{i})}+(1-y_{i})log(1-h( heta^{T}x_{i}}))+sum_{ heta_{i} in heta }{ heta^2_{i}}]\end{aligned}]
C is somewhat (dfrac{1}{lambda}).
- Large C:
- lower bias, high variance
- Small C:
- Higher bias, low variance
- Large (sigma^2): Features (f_{i}) vary more smoothly.
- Higher bias, low variance
- Small (sigma^2): Features (f_{i}) vary more sharply.
- Lower bias, high variance.
[egin{aligned} & dfrac{1}{2} sum_{ heta_{i} in heta}{ heta_{i}^2}&s.t& heta^{T}x_{i} geq 1, if y_{i} = 1&&& heta^{T}x_{i} leq -1, if y_{i} = 0& end{aligned}]
- Lower bias, high variance.
PS
If features are too many related to m, use logistic regression or SVM without a kernel.
If n is small, m is intermediate, use SVM with Gaussian kernal.
If n is small, m is large, add more features and use logistic regression or SVM without a kernel.
Week Eight
K-means
Cost Function
It try to minimize
[min_{mu}{dfrac{1}{m} sum_{i=1}^{m} ||x^{(i)} - mu_{c^{(i)}}}||^2]
For the first loop, minimize the cost function by varing the centorid. For the second loop, it minimize the cost funcion with cetorid fixed and realign the centorid of every x in the training set.
Initialize
Initialize the centorids randomly. Randomly select k samples from the training set and set the centorids to these random selected samples.
It is possible that K-meas fall into the local minimum, So repeat to initialize the centorids randomly until the cost(distortion) is suitable for your purposes.
K-means converge all the time and it will not increase the cost during the training processs. More centoirds will decease the cost, if not, the k-means must fall into the local minimum and reinitialize the centorid until the cost is less.
PCA (Principal Component Analysis)
Restruct x from z meeting the below nonequation
[1-dfrac{dfrac{1}{m} sum_{i=1}^{m}||x^{(i)}-x^{(i)}_{approximation}||^2}{dfrac{1}{m} sum_{i=1}^{m} ||x^{(i)}||^2} geq 0.99]
PS:
the nonequation can be equal to the below
[egin{aligned}
[U, S, D] &= svd(sigma) U_{reduce} &= U(:, 1:k) z &= U_{reduce}‘ * x x_{approximation} &= U_{reduce} * x\ S &= left( egin{array}{ccc}
s_{11}&0&cdots&0 0&s_{22}&cdots&0 vdots&vdots&ddots&vdots 0&0&cdots&s_{nn}
end{array}
ight)\ dfrac{sum_{i=1}^{k}s_{ii}^2}{sum_{i=1}^{m} s_{ii}^2} &geq 0.99
end{aligned}]
Week Nine
Anomaly Detection
Gaussian Distribution
Multivariate Gaussian Distribution takes the connection of different variants into account
[p(x) = dfrac{1}{(2pi)^{frac{n}{2}}|Sigma|^{frac{1}{2}}}e^{-frac{1}{2}(x-mu)^{T}Sigma^{-1}(x-mu)}]
Single variant Gaussian Distribution is a special example of Multivariate Gaussian Distribution, where
[Sigma = left(egin{array}{ccc}
sigma_{11}&&&& &sigma_{22}&&& &&ddots&& &&&sigma_{nn}&\end{array}
ight)]
When training the Anomaly Detection, we can use Maximum Likelihood Estimation
[egin{aligned}
mu &= dfrac{1}{m} sum_{i=1}^{m}x^{(i)} Sigma &= dfrac{1}{m} sum_{i=1}^{m} (x^{(i)}-mu)(x^{(i)}-mu)^{T}
end{aligned}]
When we use single variant anomaly detection, the numerical cost is much cheaper than multivariant. But may need to add some new features to distinguish the normal and non-normal.
Recommender System
Cost Function
[egin{aligned}
J(X,Theta) = dfrac{1}{2} sum_{(i,j):r(i,j)=1}(( heta^{(j)})^{T}x^{(i)}-y^{(i,j)})^2 + dfrac{lambda}{2}[sum_{i=1}^{n_{m}}sum_{k=1}^{n}(x_k^{(i)})^2 + sum_{j=1}^{n_mu} sum_{k=1}^n( heta_{k}^{(j)})^2] J(X,Theta) = dfrac{1}{2}Sum{(XTheta‘-Y).*R} + dfrac{lambda}{2}(Sum{Theta.^2} + Sum{X.^2}\end{aligned}]
[egin{aligned}
dfrac{partial J}{partial X} = ((XTheta‘-Y).*R) Theta + lambda X dfrac{partial J}{partial Theta} = ((XTheta‘-Y).*R)‘X + lambda Theta
end{aligned}]
以上是关于MachineLearningOnCoursera的主要内容,如果未能解决你的问题,请参考以下文章