Sklearn - FeatureUnion - Transformer: TypeError: fit_transform() 接受 2 个位置参数,但给出了 3 个

Posted

技术标签:

【中文标题】Sklearn - FeatureUnion - Transformer: TypeError: fit_transform() 接受 2 个位置参数,但给出了 3 个【英文标题】:Sklearn - FeatureUnion - Transformer: TypeError: fit_transform() takes 2 positional arguments but 3 were given 【发布时间】:2017-05-07 12:25:57 【问题描述】:

对于这个大代码块,我提前道歉。这是我可以提供一个可重复的工作示例的最简洁的方式。

在代码中,我尝试使用FeatureUnion 转换数据框中的两列,其中一列是文本数据,所以TfidfVectorizer,另一列是标签列表列,所以我想使用@987654324 @。

ItemSelector 转换器用于从数据框中选择右列。

为什么我会收到TypeError: fit_transform() takes 2 positional arguments but 3 were given

我需要对代码进行哪些更改才能使此示例正常运行?

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier

import pandas as pd
import numpy as np

d = 'label': ['Help', 'Help', 'Other', 'Sale/Coupon', 'Other', 'Help', 'Help',
           'Other', 'Sale/Coupon', 'Other', 'Help', 'Help', 'Other',
           'Sale/Coupon', 'Other', 'Help', 'Help', 'Other', 'Sale/Coupon',
           'Other', 'Help', 'Help', 'Other', 'Sale/Coupon', 'Other'],
     'multilabels': ["['Samples']", "['Deck']", "['Deck', 'Deck Over', 'Stain']",
                     "['Coupons']", "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']"],
     'response': ['this is some text', 'this is some more text',
                  'and here is some more', 'and some more',
                  'and here we go some more yay done', 'this is some text',
                  'this is some more text', 'and here is some more',
                  'and some more', 'and here we go some more yay done',
                  'this is some text', 'this is some more text',
                  'and here is some more', 'and some more',
                  'and here we go some more yay done', 'this is some text',
                  'this is some more text', 'and here is some more',
                  'and some more', 'and here we go some more yay done',
                  'this is some text', 'this is some more text',
                  'and here is some more', 'and some more',
                  'and here we go some more yay done']

class ItemSelector(BaseEstimator, TransformerMixin):
  def __init__(self, key):
    self.key = key

  def fit(self, X, y=None):
    return self

  def transform(self, df):
    return df[self.key]

feature_union = FeatureUnion(
  transformer_list=[
    ('step1', Pipeline([
      ('selector', ItemSelector(key='response')),
      ('tfidf', TfidfVectorizer()),
      ])),
    ('step2', Pipeline([
      ('selector', ItemSelector(key='multilabels')),
      ('multilabel', MultiLabelBinarizer())
      ]))
    ])

pipeline = OneVsRestClassifier(
  Pipeline([('union', feature_union),('sgd', SGDClassifier())])
  )

grid = GridSearchCV(pipeline, , verbose=5)

df = pd.DataFrame(d, columns=['response', 'multilabels', 'label'])
X = df[['response', 'multilabels']]
y = df['label']
grid.fit(X, y)

这是完整的错误:

Traceback (most recent call last):
  File "C:/Users/owner/Documents/my files/Account Tracking/Client/Foresee Analysis/SOQuestion.py", line 72, in <module>
    grid.fit(X, y)
  File "C:\Python34\lib\site-packages\sklearn\model_selection\_search.py", line 945, in fit
    return self._fit(X, y, groups, ParameterGrid(self.param_grid))
  File "C:\Python34\lib\site-packages\sklearn\model_selection\_search.py", line 564, in _fit
    for parameters in parameter_iterable
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python34\lib\site-packages\sklearn\model_selection\_validation.py", line 238, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 216, in fit
    for i, column in enumerate(columns))
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python34\lib\site-packages\sklearn\multiclass.py", line 80, in _fit_binary
    estimator.fit(X, y)
  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 268, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 234, in _fit
    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 734, in fit_transform
    for name, trans, weight in self._iter())
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 758, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 608, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 571, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 326, in __init__
    self.results = batch()
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python34\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 577, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "C:\Python34\lib\site-packages\sklearn\pipeline.py", line 303, in fit_transform
    return last_step.fit_transform(Xt, y, **fit_params)
TypeError: fit_transform() takes 2 positional arguments but 3 were given

注意:我看过_transform() takes 2 positional arguments but 3 were given,但对我来说仍然没有意义。

【问题讨论】:

【参考方案1】:

知道了。制作了另一个转换器来处理多标签二值化。这更像是一种变通方法而不是解决方案,因为二值化发生在转换而不是管道中。

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier

import pandas as pd
import numpy as np

d = 'label': ['Help', 'Help', 'Other', 'Sale/Coupon', 'Other', 'Help', 'Help',
           'Other', 'Sale/Coupon', 'Other', 'Help', 'Help', 'Other',
           'Sale/Coupon', 'Other', 'Help', 'Help', 'Other', 'Sale/Coupon',
           'Other', 'Help', 'Help', 'Other', 'Sale/Coupon', 'Other'],
     'multilabels': ["['Samples']", "['Deck']", "['Deck', 'Deck Over', 'Stain']",
                     "['Coupons']", "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']", "['Samples']", "['Deck']",
                     "['Deck', 'Deck Over', 'Stain']", "['Coupons']",
                     "['Bathroom']"],
     'response': ['this is some text', 'this is some more text',
                  'and here is some more', 'and some more',
                  'and here we go some more yay done', 'this is some text',
                  'this is some more text', 'and here is some more',
                  'and some more', 'and here we go some more yay done',
                  'this is some text', 'this is some more text',
                  'and here is some more', 'and some more',
                  'and here we go some more yay done', 'this is some text',
                  'this is some more text', 'and here is some more',
                  'and some more', 'and here we go some more yay done',
                  'this is some text', 'this is some more text',
                  'and here is some more', 'and some more',
                  'and here we go some more yay done']

class ItemSelector(BaseEstimator, TransformerMixin):
  def __init__(self, column):
    self.column = column

  def fit(self, X, y=None, **fit_params):
    return self

  def transform(self, X, y=None, **fit_params):
    return X[self.column]

class MultiLabelTransformer(BaseEstimator, TransformerMixin):

  def __init__(self, column):
    self.column = column

  def fit(self, X, y=None):
    return self

  def transform(self, X):
    mlb = MultiLabelBinarizer()
    return mlb.fit_transform(X[self.column])

pipeline = OneVsRestClassifier(
  Pipeline([
  ('union', FeatureUnion(
    transformer_list=[
      ('step1', Pipeline([
        ('selector', ItemSelector(column='response')),
        ('tfidf', TfidfVectorizer())
        ])),
      ('step2', Pipeline([
        ('selector', MultiLabelTransformer(column='multilabels'))
        ]))
      ])),
  ('sgd', SGDClassifier())
  ])
  )

grid = GridSearchCV(pipeline, , verbose=5)

df = pd.DataFrame(d, columns=['response', 'multilabels', 'label'])
df['multilabels'] = df['multilabels'].apply(lambda s: eval(s))
X = df[['response', 'multilabels']]
y = df['label']
grid.fit(X, y)

【讨论】:

以上是关于Sklearn - FeatureUnion - Transformer: TypeError: fit_transform() 接受 2 个位置参数,但给出了 3 个的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 sklearn Pipeline 和 FeatureUnion 选择多个(数字和文本)列进行文本分类?

Sklearn:异质特征的FeatureUnion与管道中的分类器产生不兼容的行尺寸错误

featureUnion vs columnTransformer?

Scikit Learn 从管道内的 FeatureUnion 中提取特征名称

word2vec 的自定义转换器和 FeatureUnion

sklearn工具-数据集变换