Sklearn 推出您自己的估算器,检查估算器错误

Posted

技术标签:

【中文标题】Sklearn 推出您自己的估算器,检查估算器错误【英文标题】:Sklearn rolling out your own estimator, check estimator error 【发布时间】:2020-09-05 08:15:59 【问题描述】:

有人可以告诉我为什么我不断收到这些错误

class AdaBoostClassifier(ClassifierMixin, BaseEstimator):

def __init__(self, base_estimator = None, n_estimators = 50, random_state = None):
    self.base_estimator = base_estimator
    self.n_estimators = n_estimators
    self.random_state = random_state

def fit(self, X, y):
    """
    ----------
    X : array-like, shape (n_samples, n_features)
        The training input samples.
    y : array-like, shape (n_samples,)
        The target values. An array of int.
    Returns
    -------
    self : object
        Returns self.
    """
    # Check that X and y have correct shape
    X, y = check_X_y(X, y)
    # Store the classes seen during fit
    self.classes_ = unique_labels(y)

    self.X_ = X
    self.y_ = y

    self.models = []
    self.alphas = []
    n_samples, _ = X.shape
    w = np.ones(n_samples) / n_samples

    for m in range(self.n_estimators):
        clf = DecisionTreeClassifier(max_depth = 1)
        clf.fit(X,y, sample_weight = w)
        pred = clf.predict(X)

        error = w.dot(pred != y)
        alpha = 0.5*(np.log(1-error)-np.log(error))

        w = w*np.exp(-alpha*y*pred)
        w = w/w.sum() # normalise to sum to 1

        self.models.append(clf)
        self.alphas.append(alpha)

    # Return the classifier
    return self.models

def predict(self, X):
    """ A reference implementation of a prediction for a classifier.
    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        The input samples.
    Returns
    -------
    y : ndarray, shape (n_samples,)
        The label for each sample is the label of the closest sample
        seen during fit.
    """
    # Check is fit had been called
    check_is_fitted(self, ['X_', 'y_'])

    # Input validation
    X = check_array(X)

    n_samples, _ = X.shape
    self.ada = np.zeros(n_samples)
    for alpha, clf in zip(self.alphas, self.models):
        self.ada += alpha*clf.predict(X)
        self.ada = np.sign(self.ada)
    return self.ada

def score(self, X, y):
    self.pred = self.predict(X)
    self.accuracy = 100*sum(self.pred==y)/len(y)
    return self.accuracy

check_estimator(AdaBoostClassifier)

Traceback(最近一次调用最后一次): 文件“C:\Users\Desktop\ada.py”,第 98 行,在 check_estimator(AdaBoostClassifier) 文件“C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\estimator_checks.py”,第 302 行,在 check_estimator 检查(名称,估算器) 包装器中的文件“C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\testing.py”,第 355 行 返回 fn(*args, **kwargs) 文件“C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\estimator_checks.py”,第 1646 行,在 check_estimators_fit_returns_self 断言 estimator.fit(X, y) 是估计器 断言错误

【问题讨论】:

【参考方案1】:

我相信您的 fit 方法应该返回 self,而不是 self.models

【讨论】:

以上是关于Sklearn 推出您自己的估算器,检查估算器错误的主要内容,如果未能解决你的问题,请参考以下文章

如何在 sklearn 中编写自定义估算器并对其使用交叉验证?

根据价值使用不同的估算器

如何在TensorFlow中创建自定义估算器?

Flink批处理优化器之成本估算

介绍 TensorFlow 估算器

95-874-040-源码-批处理-Flink批处理优化器值成本估算