发现样本数量不一致的输入变量:[164, 41]

Posted

技术标签:

【中文标题】发现样本数量不一致的输入变量:[164, 41]【英文标题】:Found input variables with inconsistent numbers of samples: [164, 41] 【发布时间】:2021-05-06 08:38:15 【问题描述】:

我正在尝试使用随机森林创建一个预测模型,该模型将 CarName 预测为预测变量,并且特征是 gas、rear、two。

CarName 是分类变量,其余的是数字。 在尝试运行以下代码时出现此错误,谁能帮我解决一下,在此先感谢,这是我的代码。

snipets...

from sklearn.model_selection import train_test_split
X=df6[['gas','rear','two']] #these are all in int form
y=df6[['CarName']].values.reshape(-1,1) # this is in object form
X_train,X_test,y_test,y_train=train_test_split(X,y,test_size=0.2)


from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X_train,y_train)

出现错误。

 ValueError                                Traceback (most recent call last)
<ipython-input-54-4c45187c84b2> in <module>
      1 from sklearn.ensemble import RandomForestClassifier
      2 clf=RandomForestClassifier(n_estimators=100)
----> 3 clf.fit(X_train,y_train)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
    302                 "sparse multilabel-indicator for y is not supported."
    303             )
--> 304         X, y = self._validate_data(X, y, multi_output=True,
    305                                    accept_sparse="csc", dtype=DTYPE)
    306         if sample_weight is not None:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    431                 y = check_array(y, **check_y_params)
    432             else:
--> 433                 X, y = check_X_y(X, y, **check_params)
    434             out = X, y
    435 

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    829         y = y.astype(np.float64)
    830 
--> 831     check_consistent_length(X, y)
    832 
    833     return X, y

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    260     uniques = np.unique(lengths)
    261     if len(uniques) > 1:
--> 262         raise ValueError("Found input variables with inconsistent numbers of"
    263                          " samples: %r" % [int(l) for l in lengths])
    264 

ValueError: Found input variables with inconsistent numbers of samples: [164, 41]

我的 df 的形状。

X_train.shape,y_train.shape
    Out[53]:
    ((164, 3), (41, 1)) #I guess this is the code which giving me error but am unable to solve it

【问题讨论】:

正如错误所说,您需要确保数据和标签之间存在一一对应关系。 164 个数据点不能有 41 个标签。 @Ananda 我该怎么做,你能告诉我例子吗 datascience.stackexchange.com/questions/20199/… 【参考方案1】:

你得到的错误是因为这个:

X_train,X_test,y_test,y_train=train_test_split(X,y,test_size=0.2)

值的映射基于 train_test_split 的返回按此顺序发生:

X_train,X_test,y_train,y_test

即。 y_train 后跟 y_test,因此形状不匹配。只需更改它,它就会正常工作。

【讨论】:

以上是关于发现样本数量不一致的输入变量:[164, 41]的主要内容,如果未能解决你的问题,请参考以下文章

发现样本数量不一致的输入变量:[4, 1] [关闭]

样本数量不一致的 Python Sklearn 变量

ValueError:发现样本数量不一致的输入变量:[4, 304]

ValueError:发现样本数量不一致的输入变量:[100, 300]

发现样本数量不一致的输入变量:Python 中的 [23038, 7680]

sklearn:发现样本数量不一致的输入变量:[1, 99]