sklearn中的naive bayes算法
Posted bitcarmanlee
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sklearn中的naive bayes算法相关的知识,希望对你有一定的参考价值。
1.总览
sklearn中的naive bayes一共有五种,如果进入到源码中,会发现该模块文件中最开始的位置有如下源码:
__all__ = ['BernoulliNB', 'GaussianNB', 'MultinomialNB', 'ComplementNB',
'CategoricalNB']
以上这五个就是总共的五种算法。
2.GaussianNB
看到GaussianNB这个名字,那肯定就是跟高斯分布有关系。如果原始数据是连续值且符合高斯分布,那么使用GaussianNB是个不错的选择,比如大众的工资收入,人的身高体重等比较符合高斯分布的数据。
class GaussianNB(_BaseNB):
"""
Gaussian Naive Bayes (GaussianNB)
Can perform online updates to model parameters via :meth:`partial_fit`.
For details on algorithm used to update feature means and variance online,
see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:
http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
Read more in the :ref:`User Guide <gaussian_naive_bayes>`.
Parameters
----------
priors : array-like, shape (n_classes,)
Prior probabilities of the classes. If specified the priors are not
adjusted according to the data.
var_smoothing : float, optional (default=1e-9)
Portion of the largest variance of all features that is added to
variances for calculation stability.
...............
def __init__(self, priors=None, var_smoothing=1e-9):
self.priors = priors
self.var_smoothing = var_smoothing
GaussianNB只有两个,或者说一个参数:priors。priors是各类别的先验概率,如果没有的话,则是从数据集中计算得出。而另外一个所谓的参数var_smoothing是为了计算稳定性。
3.MultinomialNB
class MultinomialNB(_BaseDiscreteNB):
"""
Naive Bayes classifier for multinomial models
The multinomial Naive Bayes classifier is suitable for classification with
discrete features (e.g., word counts for text classification). The
multinomial distribution normally requires integer feature counts. However,
in practice, fractional counts such as tf-idf may also work.
MultinomialNB适合特征是离散值且满足多项式分布的情况,比如文本分类中的词频。同时注释中还特别标明,tf-idf这类特征也适合MultinomialNB。
Parameters
----------
alpha : float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
fit_prior : boolean, optional (default=True)
Whether to learn class prior probabilities or not.
If false, a uniform prior will be used.
class_prior : array-like, size (n_classes,), optional (default=None)
Prior probabilities of the classes. If specified the priors are not
adjusted according to the data.
......
def __init__(self, alpha=1.0, fit_prior=True, class_prior=None):
self.alpha = alpha
self.fit_prior = fit_prior
self.class_prior = class_prior
.......
源码中,注意到三个参数,alpha, fit_prior, class_prior。参数alpha是在计算概率的时候进行拉普拉斯平滑,fit_prior表示是否学习先验概率,而class_prior是给定的先验概率,如果没有给定则从数据集中自行计算。
4.BernoulliNB
class BernoulliNB(_BaseDiscreteNB):
"""Naive Bayes classifier for multivariate Bernoulli models.
Like MultinomialNB, this classifier is suitable for discrete data. The
difference is that while MultinomialNB works with occurrence counts,
BernoulliNB is designed for binary/boolean features.
......
def __init__(self, alpha=1.0, binarize=.0, fit_prior=True,
class_prior=None):
self.alpha = alpha
self.binarize = binarize
self.fit_prior = fit_prior
self.class_prior = class_prior
通过上面这段注释不难看出,BernoulliNB与MultinomialNB的唯一区别在于,MultinomialNB的特征是类似词频这种特征,而BernoulliNB使用的是布尔类型的特征,即这个词有没有出现过。
BernoulliNB在那些短文本,并且关键词区分度比较明显的场景中,可能效果会很好。
5.ComplementNB
class ComplementNB(_BaseDiscreteNB):
"""The Complement Naive Bayes classifier described in Rennie et al. (2003).
The Complement Naive Bayes classifier was designed to correct the "severe
assumptions" made by the standard Multinomial Naive Bayes classifier. It is
particularly suited for imbalanced data sets.
......
ComplementNB是用来纠正标准MultinomialNB的一个严重假设,最后一句话点明了该分类器的用途:特别适合样本不平衡数据集。
6.CategoricalNB
class CategoricalNB(_BaseDiscreteNB):
"""Naive Bayes classifier for categorical features
The categorical Naive Bayes classifier is suitable for classification with
discrete features that are categorically distributed. The categories of
each feature are drawn from a categorical distribution.
......
分类朴素贝叶斯分类器适用于具有分类分布的离散特征的分类。 每个特征的类别都来自一个分类分布。
以上是关于sklearn中的naive bayes算法的主要内容,如果未能解决你的问题,请参考以下文章
如何纠正 sklearn.naive_bayes 中的 sample_weight?
朴素贝叶斯(Naive Bayes)及python实现(sklearn)
如何将 sklearn.naive_bayes 与(多个)分类特征一起使用? [关闭]
使用 Sklearn.naive_bayes.Bernoulli 的朴素贝叶斯分类器;如何使用模型进行预测?