使用 SimpleImputer 中的 .fit() 进行插补时出错

Posted

技术标签:

【中文标题】使用 SimpleImputer 中的 .fit() 进行插补时出错【英文标题】:Error when imputing using .fit() from SimpleImputer 【发布时间】:2020-07-15 09:03:36 【问题描述】:

我正在研究 Python for Data Science For Dummies 资源。我目前正在学习如何使用 pandas 估算缺失的数据值。以下是我的代码:

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

imp = SimpleImputer(missing_values='NaN',
              strategy='mean')
# creates imputer to replace missing values.
# missing_values parameter defines what we are looking out for to impute.
# strategy parameter implies with what value you want to replace the missing value.
# strategy can be either: mean, median, most_frequent

imp.fit([[1, 2, 3, 4, 5, 6, 7]])
'''
Before imputing, we must provide stats for the imputer to use by calling fit(). 
'''

s = [[1, 2, 3, np.NaN, 5, 6, None]]


print(imp.transform(s))
x = pd.Series(imp.transform(s).tolist()[0])  # .transform() fills in the missing values in s
# we want to display the result as a series. 
# from the imputer we want to transform our imputer output to a list using .tolist()
# after that we want to transform the list into a series by enclosing it within .Series()
print(x)

但是,当我运行代码时,它在 imp.fit() 行返回错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-3b624663cf89> in <module>
     15 # strategy can be either: mean, median, most_frequent
     16 
---> 17 imp.fit([[1, 2, 3, 4, 5, 6, 7]])
     18 '''
     19 Before imputing, we must provide stats for the imputer to use by calling fit().

/Applications/anaconda3/lib/python3.7/site-packages/sklearn/impute/_base.py in fit(self, X, y)
    266         self : SimpleImputer
    267         """
--> 268         X = self._validate_input(X)
    269         super()._fit_indicator(X)
    270 

/Applications/anaconda3/lib/python3.7/site-packages/sklearn/impute/_base.py in _validate_input(self, X)
    242                 raise ve
    243 
--> 244         _check_inputs_dtype(X, self.missing_values)
    245         if X.dtype.kind not in ("i", "u", "f", "O"):
    246             raise ValueError("SimpleImputer does not support data with dtype "

/Applications/anaconda3/lib/python3.7/site-packages/sklearn/impute/_base.py in _check_inputs_dtype(X, missing_values)
     26                          " both numerical. Got X.dtype= and "
     27                          " type(missing_values)=."
---> 28                          .format(X.dtype, type(missing_values)))
     29 
     30 

ValueError: 'X' and 'missing_values' types are expected to be both numerical. Got X.dtype=float64 and  type(missing_values)=<class 'str'>.

非常感谢您对此事的任何帮助!

另外,无论您身在何处,我都希望您能很好地应对 COVID-19 的情况!

【问题讨论】:

【参考方案1】:

您的参数missing_values 有一个字符串作为值'NaN',您可以使用:

missing_values = np.nan

【讨论】:

以上是关于使用 SimpleImputer 中的 .fit() 进行插补时出错的主要内容,如果未能解决你的问题,请参考以下文章

为啥 ColumnTransformer 中的 SimpleImputer 会创建额外的列?

如何在 scikit-learn 管道中的 CountVectorizer 之前包含 SimpleImputer?

sklearn的SimpleImputer和Imputer的区别

sklearn.compose.make_column_transformer():在一个数据帧列上一步使用 SimpleImputer() 和 OneHotEncoder()

Sklearn 的 SimpleImputer 不能在管道中工作?

将 clf.fit 与 csv 中的 numpy 数组一起使用