python-sklearn中出现“ValueError：预期的二维数组，得到一维数组”错误[重复]

Posted 2023-03-12

技术标签:

【中文标题】python-sklearn中出现“ValueError：预期的二维数组，得到一维数组”错误[重复]【英文标题】：Getting "ValueError: Expected 2D array, got 1D array instead" error in python-sklearn [duplicate] 【发布时间】：2021-09-19 01:27:29 【问题描述】：

请帮助我。我无法解决我遇到的一个错误。我是python机器学习的新手。如有任何建议，将不胜感激。

以下是我编写的代码，用于根据性别、学历和执照来预测公司员工可能喜欢的交通工具：

Gender = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Gender'])
Engineer = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Engineer'])
MBA = preprocessing.LabelEncoder().fit_transform(df.loc[:,'MBA'])
License = preprocessing.LabelEncoder().fit_transform(df.loc[:,'license'])
Transport = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Transport'])
x,y = Gender.reshape(-1,1), Transport
print("\n\nGender:", Gender, "\n\nEngineer:", Engineer, "\n\nMBA:", MBA, "\n\nLicense:", license, "\n\nTransport:", Transport)
model = GaussianNB().fit(x,y)
a1 = input("\n\n Choose Gender : Male:1 or Female:0 = ")
b1 = input("\n\n Are you an Engineer? : Yes:1 or No:0 = ")
c1 = input("\n\n Have you done MBA? : Yes:1 or No:0 = ")
d1 = input("\n\n Do you have license? : Yes:1 or No:0 = ")

#store the output in y_pred

y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])

#for loop to predict customizable output
if y_pred == [1]:
    print("\n\n You prefer Public Transport")
else:
    print("\n\n You prefer Private Transport")

这是我在最后阶段遇到的错误：

ValueError                                Traceback (most recent call last)
<ipython-input-104-a14f86182731> in <module>
      6 #store the output in y_pred
      7 
----> 8 y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
      9 
     10 #for loop to predict customizable output

~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in predict(self, X)
     63             Predicted target values for X
     64         """
---> 65         jll = self._joint_log_likelihood(X)
     66         return self.classes_[np.argmax(jll, axis=1)]
     67 

~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in _joint_log_likelihood(self, X)
    428         check_is_fitted(self, "classes_")
    429 
--> 430         X = check_array(X)
    431         joint_log_likelihood = []
    432         for i in range(np.size(self.classes_)):

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    519                     "Reshape your data either using array.reshape(-1, 1) if "
    520                     "your data has a single feature or array.reshape(1, -1) "
--> 521                     "if it contains a single sample.".format(array))
    522 
    523         # in the future np.flexible dtypes will be handled like object dtypes

ValueError: Expected 2D array, got 1D array instead:
array=[1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

以下是我的数据集的结构：

<class 'pandas.core.frame.DataFrame'>
    Int64Index: 444 entries, 28 to 39
    Data columns (total 8 columns):
    Gender       444 non-null object
    Engineer     444 non-null int64
    MBA          444 non-null int64
    Work Exp     444 non-null int64
    Salary       444 non-null float64
    Distance     444 non-null float64
    license      444 non-null int64
    Transport    444 non-null object
    dtypes: float64(2), int64(4), object(2)
    memory usage: 31.2+ KB

【问题讨论】：

【参考方案1】：

错误消息非常冗长，并告诉您您提供了一个一维数组，而应该是一个二维数组：

预期的二维数组，得到一维数组

堆栈跟踪指向这一行：

y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])

它还告诉你如何解决这个问题：

如果您的数据具有单个特征，则使用 array.reshape(-1, 1) 重塑您的数据，如果数据包含单个样本，则使用 array.reshape(1, -1)。

由于您尝试预测单个样本，因此应使用后者：

import numpy as np


y_pred = model.predict(np.array([int(a1),int(b1),int(c1),int(d1)]).reshape(1, -1))

请注意，我删除了没有用的双重赋值 y_pred = model = ...。

补充说明

与此特定错误无关，但可能不是您想要的：您仅在性别特征上拟合模型。请参阅以下几行：

x,y = Gender.reshape(-1,1), Transport
...
model = GaussianNB().fit(x,y)

这会破坏您的代码，因为您要在单个特征上拟合模型，然后想要预测具有四个特征的样本。你也应该解决这个问题。解决方案可能如下所示：

X = OrdinalEncoder().fit_transform(df.loc[:,['Gender', 'Engineer', 'MBA', 'license']])
y = LabelEncoder().fit_transform(df.loc[:,'Transport'])

model = GaussianNB()
model.fit(X, y)

看到我使用了OrdinalEncoder 的功能，因为LabelEncoder 仅用于编码目标y（与documentation 相比）。

【讨论】：

太棒了！非常感谢您的指导。它确实解决了我的问题。再次感谢您提供如此精彩的解释。

以上是关于python-sklearn中出现“ValueError：预期的二维数组，得到一维数组”错误[重复]的主要内容，如果未能解决你的问题，请参考以下文章

我应该如何理解 python-sklearn 中的 .transform 方法？

python-sklearn数据拆分与决策树的实现

Python-sklearn数据预处理（单/多个数据集数据标准化稳健标准化缺失值填补）

如何基于Python-sklearn搭建一个机器学习模型，保存与调用该模型，有哪些机器学习回归算法

猿创征文｜Python-sklearn机器学习快速入门：你的第一个机器学习实战项目

元组列表字典集合的简单梳理