机器学习笔记(Washington University)- Clustering Specialization-week four

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了机器学习笔记(Washington University)- Clustering Specialization-week four相关的知识,希望对你有一定的参考价值。

1. Probabilistic clustering model

  • (k-means) Hard assignments do not tell the full story, capture the uncertainty
  • k-means only considers the cluster center, not good for overlapping clusters,disparate cluster size,different shaped cluster
  • learn weights on dimensions
  • can learn cluster-specific weights on dimensions

 

2. Gaussian distribution

1-D gaussian is fully specified by mean μ and variance σ2.

2-D gaussian is fully specified by mean μ vector and covariance matrix Σ.

技术分享

 

 thusly our mixture model of gaussian is defined by 

k, μk, Σk

 

3. EM(Expectation maximization)

what if we knew the cluster parameters {πk, μk, Σk} ?

compute responsibilites:

技术分享

 

rik is the responsibility cluster k takes for observation i.

p is the probability of assignment to cluster k, given model parameters and observaed value.

πk is the initial probability of being from cluster k.

N is the gaussian model.

 

what if we knew the cluster soft assignments rij ?

技术分享

 

The procedure for the iterative algorithm:

1. initialize

2. estimate cluster responsibilities given current parameter estimates(E-step)

3. maximize likelihood given soft assignments

 

Notes:

EM is a coordinate-ascent algorithm

EM converges to a local mode

There are many ways to initialize the EM algorithm and it is important for convergence rates and quality of local mode

  • random choose k centroids
  • pick center sequentially like in k-means++
  • initilize k-means solution
  • grow mixture model by splitting until k clusters are formed

prevent overfitting

  • Do not let the variance goes down to zero, add small amount to diagonal of covariance estimate

 

以上是关于机器学习笔记(Washington University)- Clustering Specialization-week four的主要内容,如果未能解决你的问题,请参考以下文章

机器学习笔记(Washington University)- Regression Specialization-week four

机器学习笔记(Washington University)- Classification Specialization-week 3

机器学习笔记(Washington University)- Regression Specialization-week five

机器学习笔记(Washington University)- Regression Specialization-week six

机器学习笔记(Washington University)- Regression Specialization-week one

机器学习笔记(Washington University)- Clustering Specialization-week four