sklearn中的逻辑回归

Posted bitcarmanlee

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sklearn中的逻辑回归相关的知识,希望对你有一定的参考价值。

1.sklearn中与逻辑回归有关的三个类

sklearn中,lr相关的代码在linear_model模块中,查看linear_model的__init__文件,内容如下

__all__ = ['ARDRegression',
           'BayesianRidge',
           'ElasticNet',
           'ElasticNetCV',
           'Hinge',
           'Huber',
           'HuberRegressor',
           'Lars',
           'LarsCV',
           'Lasso',
           'LassoCV',
           'LassoLars',
           'LassoLarsCV',
           'LassoLarsIC',
           'LinearRegression',
           'Log',
           'LogisticRegression',
           'LogisticRegressionCV',
           'ModifiedHuber',
           'MultiTaskElasticNet',
           'MultiTaskElasticNetCV',
           'MultiTaskLasso',
           'MultiTaskLassoCV',
           'OrthogonalMatchingPursuit',
           'OrthogonalMatchingPursuitCV',
           'PassiveAggressiveClassifier',
           'PassiveAggressiveRegressor',
           'Perceptron',
           'Ridge',
           'RidgeCV',
           'RidgeClassifier',
           'RidgeClassifierCV',
           'SGDClassifier',
           'SGDRegressor',
           'SquaredLoss',
           'TheilSenRegressor',
           'enet_path',
           'lars_path',
           'lars_path_gram',
           'lasso_path',
           'logistic_regression_path',
           'orthogonal_mp',
           'orthogonal_mp_gram',
           'ridge_regression',
           'RANSACRegressor']

可以看出来,与lr有关的部分一共有三个类:

LogisticRegression
LogisticRegressionCV
logistic_regression_path

2.logistic_regression_path

@deprecated('logistic_regression_path was deprecated in version 0.21 and '
            'will be removed in version 0.23.0')
def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
                             max_iter=100, tol=1e-4, verbose=0,
                             solver='lbfgs', coef=None,
                             class_weight=None, dual=False, penalty='l2',
                             intercept_scaling=1., multi_class='auto',
                             random_state=None, check_input=True,
                             max_squared_sum=None, sample_weight=None,
                             l1_ratio=None):
    """Compute a Logistic Regression model for a list of regularization
    parameters.

    This is an implementation that uses the result of the previous model
    to speed up computations along the set of solutions, making it faster
    than sequentially calling LogisticRegression for the different parameters.
    Note that there will be no speedup with liblinear solver, since it does
    not handle warm-starting.

    .. deprecated:: 0.21
        ``logistic_regression_path`` was deprecated in version 0.21 and will
        be removed in 0.23.

    Read more in the :ref:`User Guide <logistic_regression>`.

首先可以看到,logistic_regression_path将会在0.23.0版本移出。
由注释说明不难看出,logistic_regression_path主要是基于之前的训练结果用于训练加速。同时特别强调,如果用的是liblinear求解器,logistic_regression_path不能加快速度,因为liblinear不能处理warm-starting场景。

3.LogisticRegression

class LogisticRegression(BaseEstimator, LinearClassifierMixin,
                         SparseCoefMixin):
    """
    Logistic Regression (aka logit, MaxEnt) classifier.

    In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
    scheme if the 'multi_class' option is set to 'ovr', and uses the
    cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
    (Currently the 'multinomial' option is supported only by the 'lbfgs',
    'sag', 'saga' and 'newton-cg' solvers.)

    This class implements regularized logistic regression using the
    'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
    that regularization is applied by default**. It can handle both dense
    and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
    floats for optimal performance; any other input format will be converted
    (and copied).

    The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
    with primal formulation, or no regularization. The 'liblinear' solver
    supports both L1 and L2 regularization, with a dual formulation only for
    the L2 penalty. The Elastic-Net regularization is only supported by the
    'saga' solver.

    Read more in the :ref:`User Guide <logistic_regression>`.
...
    def __init__(self, penalty='l2', dual=False, tol=1e-4, C=1.0,
                 fit_intercept=True, intercept_scaling=1, class_weight=None,
                 random_state=None, solver='lbfgs', max_iter=100,
                 multi_class='auto', verbose=0, warm_start=False, n_jobs=None,
                 l1_ratio=None):

        self.penalty = penalty
        self.dual = dual
        self.tol = tol
        self.C = C
        self.fit_intercept = fit_intercept
        self.intercept_scaling = intercept_scaling
        self.class_weight = class_weight
        self.random_state = random_state
        self.solver = solver
        self.max_iter = max_iter
        self.multi_class = multi_class
        self.verbose = verbose
        self.warm_start = warm_start
        self.n_jobs = n_jobs
        self.l1_ratio = l1_ratio

看看注释里头提到的几个关键点:
1.首先对于多分类,如果multi_class设置的是ovr,那么采用的是one-vs-rest的方式。如果设置的是multinomial,会使用cross-entropy loss。
2.具体的优化求解方法包括’liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’。‘newton-cg’, ‘sag’, and ‘lbfgs’支持L2正则,liblinear支持L1与L2,Elastic-Net正则只能使用’saga’。

关心一下初始化方法中的相关参数:
1.dual:选择目标函数是原始形式还是对偶形式,默认false。
2.tol:停止迭代的标准,默认1e-4。
3.C:正则化系数,默认1.0
4.solver:最优化求解方法,默认lbfgs
5.max_iter: 最大迭代次数,默认100。

4.LogisticRegressionCV

class LogisticRegressionCV(LogisticRegression, BaseEstimator,
                           LinearClassifierMixin):
    """Logistic Regression CV (aka logit, MaxEnt) classifier.

    See glossary entry for :term:`cross-validation estimator`.

    This class implements logistic regression using liblinear, newton-cg, sag
    of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
    regularization with primal formulation. The liblinear solver supports both
    L1 and L2 regularization, with a dual formulation only for the L2 penalty.
    Elastic-Net penalty is only supported by the saga solver.

    For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter
    is selected by the cross-validator
    :class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed
    using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs'
    solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).

LogisticRegressionCV的用法与LogisticRegression基本相当,不一样在于LogisticRegressionCV通过cross validation,来选择正则化参数C,同时通过l1_ratios来选择l1正则与l2正则的组合。

以上是关于sklearn中的逻辑回归的主要内容,如果未能解决你的问题,请参考以下文章

sklearn 逻辑回归中的特征选择

使用 sklearn 在 python 中执行逻辑回归分析

解释 sklearn 中的逻辑回归特征系数值

Python SKLearn 逻辑回归中的虚拟变量

Python 中的逻辑回归和交叉验证(使用 sklearn)

python sklearn库实现逻辑回归的实例代码