ValueError: X 必须是 NumPy 数组
Posted
技术标签:
【中文标题】ValueError: X 必须是 NumPy 数组【英文标题】:ValueError: X must be a NumPy array 【发布时间】:2019-11-29 17:55:54 【问题描述】:我是 python 和机器学习的新手。尝试实现 (decision_regions) 绘图时出现错误。 我不确定我是否理解这个问题,所以我真的需要帮助解决这个问题。
我认为问题是因为目标是字符串,也许我不确定。但是我不知道如何解决这个问题,我需要帮助来解决这个问题
# import arff data using panda
data = arff.loadarff('Run1/Tr.arff')
df = pd.DataFrame(data[0])
data =pd.DataFrame(df)
data = data.loc[:,'ATT1':'ATT576']
target = df['Class']
target=target.astype(str)
#split the data into training and testing
data_train, data_test, target_train, target_test = train_test_split(data, target,test_size=0.30, random_state=0)
model1 = DecisionTreeClassifier(criterion='entropy', max_depth=1)
num_est = [1, 2, 3, 10]
label = ['AdaBoost (n_est=1)', 'AdaBoost (n_est=2)', 'AdaBoost (n_est=3)', 'AdaBoost (n_est=20)']
fig = plt.figure(figsize=(10,8))
gs = gridspec.GridSpec(2,2)
grid = itertools.product([0,1],repeat=2)
for n_est, label, grd in zip(num_est, label, grid):
boosting = AdaBoostClassifier(base_estimator=model1,n_estimators=n_est) boosting.fit(data_train,target_train)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(data_train , target_train, clf=boosting, legend=2)
plt.title(label)
plt.show();
------------------------------------------------------------------ ValueError Traceback (most recent call
> last) <ipython-input-18-646828965d5c> in <module>
> 7 boosting.fit(data_train,target_train)
> 8 ax = plt.subplot(gs[grd[0], grd[1]])
> ----> 9 fig = plot_decision_regions(data_train , target_train, clf=boosting, legend=2) # clf cannot be change because it's a
> parameter
> 10 plt.title(label)
> 11
>
> /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mlxtend/plotting/decision_regions.py
> in plot_decision_regions(X, y, clf, feature_index,
> filler_feature_values, filler_feature_ranges, ax, X_highlight, res,
> legend, hide_spines, markers, colors, scatter_kwargs, contourf_kwargs,
> scatter_highlight_kwargs)
> 127 """
> 128
> --> 129 check_Xy(X, y, y_int=True) # Validate X and y arrays
> 130 dim = X.shape[1]
> 131
>
> /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mlxtend/utils/checking.py
> in check_Xy(X, y, y_int)
> 14 # check types
> 15 if not isinstance(X, np.ndarray):
> ---> 16 raise ValueError('X must be a NumPy array. Found %s' % type(X))
> 17 if not isinstance(y, np.ndarray):
> 18 raise ValueError('y must be a NumPy array. Found %s' % type(y))
>
> ValueError: X must be a NumPy array. Found <class
> 'pandas.core.frame.DataFrame'>`enter code here`
【问题讨论】:
【参考方案1】:我使用了另一个类似的数据集。在您的代码中,您尝试使用“plot_decision_regions”无法实现的更多 tan 2 功能进行绘图,您必须使用给定链接Plotting decision boundary for High Dimension Data 中讨论的不同方法。但是如果你只想使用两个功能,那么你可以使用下面的代码。
from scipy.io import arff
import pandas as pd
import itertools
from matplotlib import gridspec
from mlxtend.plotting import plot_decision_regions
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from matplotlib import pyplot as plt
data = arff.loadarff('TR.arff')
data = pd.DataFrame(data[0])
df = data.loc[:,['att1','att2','class']]
for col_name in df.columns:
if(df[col_name].dtype == 'object'):
df[col_name]= df[col_name].astype('category')
df[col_name] = df[col_name].cat.codes
target = df['class']
df=df.drop(['class'],axis=1)
data_train, data_test, target_train, target_test = train_test_split(df, target,test_size=0.30, random_state=0)
model1 = DecisionTreeClassifier(criterion='entropy', max_depth=1)
num_est = [1, 2, 3, 10]
label = ['AdaBoost (n_est=1)', 'AdaBoost (n_est=2)', 'AdaBoost (n_est=3)', 'AdaBoost (n_est=20)']
fig = plt.figure(figsize=(10,8))
gs = gridspec.GridSpec(2,2)
grid = itertools.product([0,1],repeat=2)
for n_est, label, grd in zip(num_est, label, grid):
boosting = AdaBoostClassifier(base_estimator=model1,n_estimators=n_est)
boosting.fit(data_train,target_train)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(data_train.values , target_train.values, clf=boosting, legend=2)
plt.title(label)
plt.show();
【讨论】:
【参考方案2】:将您的数据转换为数组,然后将其传递给函数。
numpy_matrix = data.as_matrix()
【讨论】:
我试过了,但是 y 出现了一个新问题 ...>> y 必须是 NumPy 数组。找到以上是关于ValueError: X 必须是 NumPy 数组的主要内容,如果未能解决你的问题,请参考以下文章
Numpy hstack - “ValueError:所有输入数组必须具有相同的维数” - 但它们确实如此
连接两个 NumPy 数组给出“ValueError:所有输入数组必须具有相同的维数”
Numpy加载CSV - ValueError:无法将字符串转换为float
如何修复'ValueError:shapes(1,3)和(1,1)未对齐:3(dim 1)!= 1(dim 0)'numpy中的错误