ValueError: X 必须是 NumPy 数组

Posted

技术标签:

【中文标题】ValueError: X 必须是 NumPy 数组【英文标题】:ValueError: X must be a NumPy array 【发布时间】:2019-11-29 17:55:54 【问题描述】:

我是 python 和机器学习的新手。尝试实现 (decision_regions) 绘图时出现错误。 我不确定我是否理解这个问题,所以我真的需要帮助解决这个问题。

我认为问题是因为目标是字符串,也许我不确定。但是我不知道如何解决这个问题,我需要帮助来解决这个问题

 # import arff data using panda

data = arff.loadarff('Run1/Tr.arff') 
df = pd.DataFrame(data[0]) 
data =pd.DataFrame(df) 
data = data.loc[:,'ATT1':'ATT576'] 
target = df['Class'] 
target=target.astype(str)


#split the data into training and testing 
data_train, data_test, target_train, target_test = train_test_split(data, target,test_size=0.30, random_state=0) 



 model1 = DecisionTreeClassifier(criterion='entropy', max_depth=1)

num_est = [1, 2, 3, 10] 
label = ['AdaBoost (n_est=1)', 'AdaBoost (n_est=2)', 'AdaBoost (n_est=3)', 'AdaBoost (n_est=20)']

fig = plt.figure(figsize=(10,8)) 
gs = gridspec.GridSpec(2,2) 
grid = itertools.product([0,1],repeat=2)

 for n_est, label, grd in zip(num_est, label, grid):   
    boosting = AdaBoostClassifier(base_estimator=model1,n_estimators=n_est)    boosting.fit(data_train,target_train)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(data_train , target_train, clf=boosting, legend=2)  

plt.title(label)

plt.show();

------------------------------------------------------------------ ValueError                                Traceback (most recent call
> last) <ipython-input-18-646828965d5c> in <module>
>       7     boosting.fit(data_train,target_train)
>       8     ax = plt.subplot(gs[grd[0], grd[1]])
> ----> 9     fig = plot_decision_regions(data_train , target_train, clf=boosting, legend=2)  # clf cannot be change because it's a
> parameter
>      10     plt.title(label)
>      11
> 
> /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mlxtend/plotting/decision_regions.py
> in plot_decision_regions(X, y, clf, feature_index,
> filler_feature_values, filler_feature_ranges, ax, X_highlight, res,
> legend, hide_spines, markers, colors, scatter_kwargs, contourf_kwargs,
> scatter_highlight_kwargs)
>     127     """
>     128 
> --> 129     check_Xy(X, y, y_int=True)  # Validate X and y arrays
>     130     dim = X.shape[1]
>     131 
> 
> /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mlxtend/utils/checking.py
> in check_Xy(X, y, y_int)
>      14     # check types
>      15     if not isinstance(X, np.ndarray):
> ---> 16         raise ValueError('X must be a NumPy array. Found %s' % type(X))
>      17     if not isinstance(y, np.ndarray):
>      18         raise ValueError('y must be a NumPy array. Found %s' % type(y))
> 
> ValueError: X must be a NumPy array. Found <class
> 'pandas.core.frame.DataFrame'>`enter code here`

【问题讨论】:

【参考方案1】:

我使用了另一个类似的数据集。在您的代码中,您尝试使用“plot_decision_regions”无法实现的更多 tan 2 功能进行绘图,您必须使用给定链接Plotting decision boundary for High Dimension Data 中讨论的不同方法。但是如果你只想使用两个功能,那么你可以使用下面的代码。

from scipy.io import arff
import pandas as pd
import itertools
from matplotlib import gridspec
from mlxtend.plotting import plot_decision_regions
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from matplotlib import pyplot as plt

data = arff.loadarff('TR.arff') 
data = pd.DataFrame(data[0])
df = data.loc[:,['att1','att2','class']]

for col_name in df.columns:
    if(df[col_name].dtype == 'object'):
        df[col_name]= df[col_name].astype('category')
        df[col_name] = df[col_name].cat.codes


target = df['class'] 
df=df.drop(['class'],axis=1)
data_train, data_test, target_train, target_test = train_test_split(df, target,test_size=0.30, random_state=0)

model1 = DecisionTreeClassifier(criterion='entropy', max_depth=1)
num_est = [1, 2, 3, 10] 
label = ['AdaBoost (n_est=1)', 'AdaBoost (n_est=2)', 'AdaBoost (n_est=3)', 'AdaBoost (n_est=20)']

fig = plt.figure(figsize=(10,8)) 
gs = gridspec.GridSpec(2,2) 
grid = itertools.product([0,1],repeat=2)

for n_est, label, grd in zip(num_est, label, grid):   
    boosting = AdaBoostClassifier(base_estimator=model1,n_estimators=n_est)    
    boosting.fit(data_train,target_train)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(data_train.values , target_train.values, clf=boosting, legend=2)  

plt.title(label)

plt.show();

【讨论】:

【参考方案2】:

将您的数据转换为数组,然后将其传递给函数。

numpy_matrix = data.as_matrix()

【讨论】:

我试过了,但是 y 出现了一个新问题 ...>> y 必须是 NumPy 数组。找到 把你所有的测试,训练成一个数组

以上是关于ValueError: X 必须是 NumPy 数组的主要内容,如果未能解决你的问题,请参考以下文章

Numpy hstack - “ValueError:所有输入数组必须具有相同的维数” - 但它们确实如此

连接两个 NumPy 数组给出“ValueError:所有输入数组必须具有相同的维数”

numpy的矩阵乘法

Numpy加载CSV - ValueError:无法将字符串转换为float

Numpy 将矩阵附加到张量

如何修复'ValueError:shapes(1,3)和(1,1)未对齐:3(dim 1)!= 1(dim 0)'numpy中的错误