labelencoder 和 OneHotEncoder 的值错误
Posted
技术标签:
【中文标题】labelencoder 和 OneHotEncoder 的值错误【英文标题】:Value error with labelencoder and OneHotEncoder 【发布时间】:2017-12-10 20:14:26 【问题描述】:我正在尝试将一个分类字符串列转换为几个虚拟变量二进制列,但我得到一个值错误。
代码如下:
import sys, os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from dateutil import parser
import math
import traceback
import logging
datasetMod = pd.read_csv('data.csv')
X = datasetMod.iloc[:, 3:6].values
y = datasetMod.iloc[:, 1].values
print(X[:, 0])
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
try:
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
这是错误:
<class 'ValueError'> multipleLinearRegression.py 23
该列的打印语句的结果是:
['Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Weekend' 'Workday' 'Workday' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday' 'Workday'
'Workday' 'Workday' 'Workday' 'Workday' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend' 'Weekend'
'Weekend' 'Weekend' 'Weekend' 'Weekend']
字符串本身似乎没有任何问题,中间没有空格,也没有类似数字的符号。所以我不明白为什么我得到一个 valuetype can't convert string to float 错误。
任何帮助将不胜感激。
更新
onehotencoder 现在工作得有些好,但最终结果是 object 类型,而它应该是 float64 类型:
labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
onehotencoder.fit(X[:, 1])
onehotencoder.fit(X[:, 2])
onehotencoder.fit(X[:, 3])
onehotencoder.transform(X[:, 1])
onehotencoder.transform(X[:, 2])
onehotencoder.transform(X[:, 3])
X = onehotencoder.toArray()
更新 2
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
X[:, 1] = onehotencoder.fit_transform(X[:, 1]).toarray()
X[:, 2] = onehotencoder.fit_transform(X[:, 2]).toarray()
X[:, 3] = onehotencoder.fit_transform(X[:, 3]).toarray()
print(X.dtype) #object
最终代码
由于categorical_features
已经指定了索引,我可以在整个矩阵X
上拟合_transform()。感谢@mkos 的耐心等待!
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
X = onehotencoder.fit_transform(X)
【问题讨论】:
试试le.fit(X[:, 0].unique())
然后le.transform(X[:, 0])
我按照你的建议将它们分开,它的 onehotencoder.fit(X[:, 0].unique()) 导致错误
实际上,如果我删除 unique() 它可以工作,只是带有弃用警告“如果您的数据具有单一功能或 X.reshape,请使用 X.reshape(-1, 1) 重塑您的数据(1, -1) 如果它包含单个样本。"
le
意思是LabelEncoder
这个需要unique
;OneHotEncoder
不需要
再次给它一个 valuetypeerror
【参考方案1】:
这应该可以解决问题:
onehotencoder = OneHotEncoder(categorical_features = [1,2,3])
X = onehotencoder.fit_transform(X)
您可以使用以下方式打印它:
print(X.toArray())
将X
用作稀疏矩阵还不错,因为它可以节省内存。如果你想看到它,那么你把它转换成普通的np.array
和toArray()
。
【讨论】:
干杯,我如何让它成为 float64 类型?它仍然默认输入对象。(我的意思是 X) 是的,仍然是类型对象 hmm,如果我不在 onehotencoder fit_transform() 的末尾添加 .toarray(),那么我的值将变为:" ' 以 COOrdinate 格式存储 146 个元素> 如果我可以问一下,您究竟是如何检查类型的? spyder 中的变量浏览器以上是关于labelencoder 和 OneHotEncoder 的值错误的主要内容,如果未能解决你的问题,请参考以下文章
labelencoder 和 OneHotEncoder 的值错误