管道中的 fit 与 fit_transform

Posted 2023-03-12

技术标签:

【中文标题】管道中的 fit 与 fit_transform【英文标题】：fit vs fit_transform in pipeline 【发布时间】：2019-06-08 01:24:41 【问题描述】：

在本页https://www.kaggle.com/baghern/a-deep-dive-into-sklearn-pipelines

它调用fit_transfrom 对数据进行如下转换：

from sklearn.pipeline import FeatureUnion

feats = FeatureUnion([('text', text), 
                      ('length', length),
                      ('words', words),
                      ('words_not_stopword', words_not_stopword),
                      ('avg_word_length', avg_word_length),
                      ('commas', commas)])

feature_processing = Pipeline([('feats', feats)])
feature_processing.fit_transform(X_train)

在使用特征处理进行训练时，它只使用fit 然后predict

from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

pipeline.fit(X_train, y_train)

preds = pipeline.predict(X_test)
np.mean(preds == y_test)

问题是，对于第二种情况，fit 是否对X_train 进行了转换（正如transform 所实现的，因为我们在这里没有调用fit_transform）？

【问题讨论】：

【参考方案1】：

sklearn-pipeline 有一些不错的功能。它以非常干净的方式执行多项任务。我们定义了我们的features，它的transformation 和list of classifiers，我们想要在一个函数中执行。

在这第一步

pipeline = Pipeline([
    ('features',feats),
    ('classifier', RandomForestClassifier(random_state = 42)),
])

您已经定义了特征的名称和它的转换函数（包含在feat 中），在第二步中，您已经定义了分类器的名称和分类器分类器。

现在在调用pipeline.fit 时，它首先拟合特征并对其进行转换，然后将分类器拟合到转换后的特征上。所以，它为我们做了一些步骤。更多你可以check-here

【讨论】：

以上是关于管道中的 fit 与 fit_transform的主要内容，如果未能解决你的问题，请参考以下文章