training.test.split 返回空的训练集

Posted

技术标签:

【中文标题】training.test.split 返回空的训练集【英文标题】:training.test.split returns empty training set 【发布时间】:2020-07-29 08:27:51 【问题描述】:

分配我的训练和测试集并应用回归后,我得到一个错误

> ValueError                                Traceback (most recent call last)
><ipython-input-32-26b4f0d4f5a4> in <module>()
>      1 Lin = LinearRegression()
>----> 2 Lin.fit(training_x,training_y)
>C:\Users\sayaji\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in >fit(self, X, y, sample_weight)


>    510         n_jobs_ = self.n_jobs
>    511         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
>--> 512                          y_numeric=True, multi_output=True)
>    513 
>    514         if sample_weight is not None and ?>np.atleast_1d(sample_weight).ndim > 1:

>C:\Users\sayaji\Anaconda3\lib\site-packages\sklearn\utils\validation.py in >check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, >allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, >warn_on_dtype, estimator)
>    519     X = check_array(X, accept_sparse, dtype, order, copy, >force_all_finite,
>    520                     ensure_2d, allow_nd, ensure_min_samples,
>--> 521                     ensure_min_features, warn_on_dtype, estimator)
>    522     if multi_output:
>    523         y = check_array(y, 'csr', force_all_finite=True, >ensure_2d=False,

>C:\Users\sayaji\Anaconda3\lib\site-packages\sklearn\utils\validation.py in >check_array(array, accept_sparse, dtype, order, copy, force_all_finite, >ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, >estimator)
>    414                              " minimum of %d is required%s."
>    415                              % (n_samples, shape_repr, >ensure_min_samples,
>--> 416                                 context))
>    417 
>    418     if ensure_min_features > 0 and array.ndim == 2:


>ValueError: Found array with 0 sample(s) (shape=(0, 50)) while a minimum of 1 is >required.

如果你们都需要,这是我的完整代码

training_x,testing_x,training_y,testing_y = train_test_split(real_x,real_y,test_size=0.3,random_state=0)
Lin = LinearRegression()
Lin.fit(training_x,training_y)
real_x = data["R&D Spend"].values
real_y = data["State"].values
real_x = real_x.reshape(1,-1)
real_y = real_y.reshape(1,-1)
training_x,testing_x,training_y,testing_y = train_test_split(real_x,real_y,test_size=0.3,random_state=0)
Lin = LinearRegression()
Lin.fit(training_x,training_y)
real_x.shape
(1, 50)
real_x.shape
(1, 50)
training_x.shape
(0,50)
training_y.shape
(0,50)

我认为这是导致错误的原因,训练 x 和训练 y 不应该有 0 有什么我做错了吗?

【问题讨论】:

【参考方案1】:

在重现您的案例条件并运行它后,我得到一个更明确的错误:

ValueError: With n_samples=1, test_size=0.3 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

这意味着它正在对第一个维度(在本例中为 1)上的数据进行拆分,这导致结果中的第一个维度为 0。

因此,只需重新调整 train_test_split 的输入,即可将 50 作为第一个维度。它解决了它。

例如real_x.reshape(50,1)real_y 一样

【讨论】:

【参考方案2】:

当只涉及 1 个特征时,使用reshape(-1,1),数据应如下所示:

real_x = data["R&D Spend"].values
real_x = real_x.reshape(-1,1)

real_y = data["State"].values
real_y = real_y.reshape(-1,1)

training_x,testing_x,training_y,testing_y = train_test_split(real_x,real_y,test_size=0.3,random_state=0)

Lin = LinearRegression().fit(training_x,training_y)

此外,您通常希望代码更具可读性,因此请在不相关的变量或函数之间留出空格。

【讨论】:

以上是关于training.test.split 返回空的训练集的主要内容,如果未能解决你的问题,请参考以下文章

哈佛图书馆墙上的训言

返回一个空的二维数组

C++ Qt 返回空的 QString

Elasticsearch 返回空的 json 对象

AlamofireObjectMapper 返回空的 jsonarray

返回一个空的承诺