Sklearn 推出您自己的估算器，检查估算器错误

Posted 2023-03-12

技术标签:

【中文标题】Sklearn 推出您自己的估算器，检查估算器错误【英文标题】：Sklearn rolling out your own estimator, check estimator error 【发布时间】：2020-09-05 08:15:59 【问题描述】：

有人可以告诉我为什么我不断收到这些错误

class AdaBoostClassifier(ClassifierMixin, BaseEstimator)：

def __init__(self, base_estimator = None, n_estimators = 50, random_state = None):
    self.base_estimator = base_estimator
    self.n_estimators = n_estimators
    self.random_state = random_state

def fit(self, X, y):
    """
    ----------
    X : array-like, shape (n_samples, n_features)
        The training input samples.
    y : array-like, shape (n_samples,)
        The target values. An array of int.
    Returns
    -------
    self : object
        Returns self.
    """
    # Check that X and y have correct shape
    X, y = check_X_y(X, y)
    # Store the classes seen during fit
    self.classes_ = unique_labels(y)

    self.X_ = X
    self.y_ = y

    self.models = []
    self.alphas = []
    n_samples, _ = X.shape
    w = np.ones(n_samples) / n_samples

    for m in range(self.n_estimators):
        clf = DecisionTreeClassifier(max_depth = 1)
        clf.fit(X,y, sample_weight = w)
        pred = clf.predict(X)

        error = w.dot(pred != y)
        alpha = 0.5*(np.log(1-error)-np.log(error))

        w = w*np.exp(-alpha*y*pred)
        w = w/w.sum() # normalise to sum to 1

        self.models.append(clf)
        self.alphas.append(alpha)

    # Return the classifier
    return self.models

def predict(self, X):
    """ A reference implementation of a prediction for a classifier.
    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        The input samples.
    Returns
    -------
    y : ndarray, shape (n_samples,)
        The label for each sample is the label of the closest sample
        seen during fit.
    """
    # Check is fit had been called
    check_is_fitted(self, ['X_', 'y_'])

    # Input validation
    X = check_array(X)

    n_samples, _ = X.shape
    self.ada = np.zeros(n_samples)
    for alpha, clf in zip(self.alphas, self.models):
        self.ada += alpha*clf.predict(X)
        self.ada = np.sign(self.ada)
    return self.ada

def score(self, X, y):
    self.pred = self.predict(X)
    self.accuracy = 100*sum(self.pred==y)/len(y)
    return self.accuracy

check_estimator(AdaBoostClassifier)

Traceback（最近一次调用最后一次）：文件“C:\Users\Desktop\ada.py”，第 98 行，在 check_estimator（AdaBoostClassifier）文件“C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\estimator_checks.py”，第 302 行，在 check_estimator 检查（名称，估算器）包装器中的文件“C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\testing.py”，第 355 行返回 fn(*args, **kwargs) 文件“C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\estimator_checks.py”，第 1646 行，在 check_estimators_fit_returns_self 断言 estimator.fit(X, y) 是估计器断言错误

【问题讨论】：

【参考方案1】：

我相信您的 fit 方法应该返回 self，而不是 self.models

【讨论】：

以上是关于Sklearn 推出您自己的估算器，检查估算器错误的主要内容，如果未能解决你的问题，请参考以下文章