Scikit-Learn的DPGMM配件：组件数量？

Question

我正在尝试使用scikit-learn的DPGMM算法将混合正态模型拟合到某些数据。在[0]上公布的优点之一是我不需要指定组件的数量;这很好，因为我不知道我的数据中的组件数量。文档说明我只需要指定一个上限。但是，它看起来非常像是不正确的：

>>> data = numpy.random.normal(loc = 0.0, scale = 1.0, size = 1000) 
>>> from sklearn.mixture import DPGMM
>>> d = DPGMM(n_components=5)
>>> d.fit(data.reshape(-1,1))
DPGMM(alpha=1.0, covariance_type='diag', init_params='wmc', min_covar=None,
   n_components=5, n_iter=10, params='wmc', random_state=None, thresh=None,
   tol=0.001, verbose=0)
>>> d.n_components
5
>>> d.means_
array([[-0.02283383],
       [ 0.06259168],
       [ 0.00390097],
       [ 0.02934676],
       [-0.05533165]])

如您所见，拟合报告了五个组成部分（上限），即使是仅从一个正态分布中清晰采样的数据也是如此。

难道我做错了什么？我误解了什么吗？

非常感谢提前，

卢卡斯

[0] http://scikit-learn.org/stable/modules/mixture.html#dpgmm

Answer 1

另一答案