One Hot Encoding:ValueError:无法将字符串转换为float:'是'
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了One Hot Encoding:ValueError:无法将字符串转换为float:'是'相关的知识,希望对你有一定的参考价值。
我在类别值上尝试oneHotEncoder
然而,它失败了以下错误。什么可能是错的?请帮助,任何评论都欢迎。
以下是代码段
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
print(X.shape)
print(X)
================================================== =================代码的输出如下所示看起来问题是数组格式
I am a getting following ouput
(17, 7)
[[2 0 0 'Offline' 'Low' 'Cold' 'No']
[0 0 0 'Offline' 'High' 'Cold' 'No']
[3 0 1 'Online' 'High' 'Cold' 'Yes']
[2 0 1 'Offline' 'Low' 'Hot' 'Yes']
[2 0 1 'Offline' 'High' 'Hot' 'Yes']
[2 0 0 'Online' 'High' 'Cold' 'Yes']
[2 1 1 'Offline' 'Low' 'Hot' 'No']
[2 1 0 'Offline' 'Low' 'Cold' 'No']
[0 1 0 'Online' 'Low' 'Cold' 'Yes']
[3 1 1 'Online' 'Low' 'Hot' 'Yes']
[1 1 0 'Offline' 'Low' 'Hot' 'No']
[2 1 1 'Offline' 'Low' 'Hot' 'Yes']
[3 1 1 'Online' 'High' 'Hot' 'Yes']
[2 1 0 'Online' 'High' 'Hot' 'No']
[2 2 2 'Offline' 'Low' 'Hot' 'Yes']
[2 2 1 'Offline' 'Low' 'Cold' 'No']
[1 2 0 'Offline' 'High' 'Cold' 'Yes']]
(17, 7)
['Low' 'Low' 'High' 'High' 'High' 'Low' 'Low' 'Low' 'Low' 'High' 'Low'
'High' 'High' 'High' 'High' 'Low' 'Low']
(17, 7)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-42-84bec98371d4> in <module>()
28 print(X.shape)
29 onehotencoder = OneHotEncoder(categorical_features = [0])
---> 30 X = onehotencoder.fit_transform(X).toarray()
31 print(X.shape)
32 print(X)
C:UserspatilsiAppDataLocalEnthoughtCanopyedmenvsUserlibsite-packagessklearnpreprocessingdata.py in fit_transform(self, X, y)
2017 """
2018 return _transform_selected(X, self._fit_transform,
-> 2019 self.categorical_features, copy=True)
2020
2021 def _transform(self, X):
C:UserspatilsiAppDataLocalEnthoughtCanopyedmenvsUserlibsite-packagessklearnpreprocessingdata.py in _transform_selected(X, transform, selected, copy)
1807 X : array or sparse matrix, shape=(n_samples, n_features_new)
1808 """
-> 1809 X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
1810
1811 if isinstance(selected, six.string_types) and selected == "all":
C:UserspatilsiAppDataLocalEnthoughtCanopyedmenvsUserlibsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
(17, 7)
[[2 0 0 'Offline' 'Low' 'Cold' 'No']
[0 0 0 'Offline' 'High' 'Cold' 'No']
[3 0 1 'Online' 'High' 'Cold' 'Yes']
[2 0 1 'Offline' 'Low' 'Hot' 'Yes']
[2 0 1 'Offline' 'High' 'Hot' 'Yes']
[2 0 0 'Online' 'High' 'Cold' 'Yes']
[2 1 1 'Offline' 'Low' 'Hot' 'No']
[2 1 0 'Offline' 'Low' 'Cold' 'No']
[0 1 0 'Online' 'Low' 'Cold' 'Yes']
[3 1 1 'Online' 'Low' 'Hot' 'Yes']
[1 1 0 'Offline' 'Low' 'Hot' 'No']
[2 1 1 'Offline' 'Low' 'Hot' 'Yes']
[3 1 1 'Online' 'High' 'Hot' 'Yes']
[2 1 0 'Online' 'High' 'Hot' 'No']
[2 2 2 'Offline' 'Low' 'Hot' 'Yes']
[2 2 1 'Offline' 'Low' 'Cold' 'No']
[1 2 0 'Offline' 'High' 'Cold' 'Yes']]
(17, 7)
['Low' 'Low' 'High' 'High' 'High' 'Low' 'Low' 'Low' 'Low' 'High' 'Low'
'High' 'High' 'High' 'High' 'Low' 'Low']
(17, 7)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-42-84bec98371d4> in <module>()
28 print(X.shape)
29 onehotencoder = OneHotEncoder(categorical_features = [0])
---> 30 X = onehotencoder.fit_transform(X).toarray()
31 print(X.shape)
32 print(X)
C:UserspatilsiAppDataLocalEnthoughtCanopyedmenvsUserlibsite-packagessklearnpreprocessingdata.py in fit_transform(self, X, y)
2017 """
2018 return _transform_selected(X, self._fit_transform,
-> 2019 self.categorical_features, copy=True)
2020
2021 def _transform(self, X):
C:UserspatilsiAppDataLocalEnthoughtCanopyedmenvsUserlibsite-packagessklearnpreprocessingdata.py in _transform_selected(X, transform, selected, copy)
1807 X : array or sparse matrix, shape=(n_samples, n_features_new)
1808 """
-> 1809 X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)
1810
1811 if isinstance(selected, six.string_types) and selected == "all":
C:UserspatilsiAppDataLocalEnthoughtCanopyedmenvsUserlibsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: could not convert string to float: 'Yes'
答案
您应该在您想要的列上应用OneHotEncoder:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
onehotencoder = OneHotEncoder()
X_0 = onehotencoder.fit_transform(X[:, 0]).toarray()
X_1 = onehotencoder.fit_transform(X[:, 1]).toarray()
这将返回2个与X相同行数的矩阵和基于X[:, 0]
或X[:, 1]
中不同值的数量的列数
在您自由合并矩阵或其他任何内容之后。如果您想知道列或特定类别,可以访问onehotencoder.feature_indices_
但是当您使用相同的OHE时,您将丢失功能X0的信息。
我希望它有所帮助,
另一答案
即使您指定categorical_features = [0]
,OneHotEncoder仍会检查所有数据(所有列)与scikit-learn兼容,因此当其他列包含字符串数据时会抛出错误。
那么你在这里可以做的是只发送你想要虚拟编码的数据: -
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
print(X.shape)
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
print(X)
print(X.shape)
print(y)
#X = X.reshape(len(X[:, 0]), 7)
print(X.shape)
onehotencoder = OneHotEncoder()
categorical_features = [0]
# Send only the first column to onehotencoder
X_oneHotEncoded = onehotencoder.fit_transform(X[:, categorical_features]).toarray()
# Combine the two arrays back together
X_final = np.hstack((X_oneHotEncoded, X[:,1:]))
以上是关于One Hot Encoding:ValueError:无法将字符串转换为float:'是'的主要内容,如果未能解决你的问题,请参考以下文章
在 sklearn 和命名列中对多个列进行 One-hot-encoding