training.test.split 返回空的训练集
Posted
技术标签:
【中文标题】training.test.split 返回空的训练集【英文标题】:training.test.split returns empty training set 【发布时间】:2020-07-29 08:27:51 【问题描述】:分配我的训练和测试集并应用回归后,我得到一个错误
> ValueError Traceback (most recent call last)
><ipython-input-32-26b4f0d4f5a4> in <module>()
> 1 Lin = LinearRegression()
>----> 2 Lin.fit(training_x,training_y)
>C:\Users\sayaji\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in >fit(self, X, y, sample_weight)
> 510 n_jobs_ = self.n_jobs
> 511 X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
>--> 512 y_numeric=True, multi_output=True)
> 513
> 514 if sample_weight is not None and ?>np.atleast_1d(sample_weight).ndim > 1:
>C:\Users\sayaji\Anaconda3\lib\site-packages\sklearn\utils\validation.py in >check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, >allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, >warn_on_dtype, estimator)
> 519 X = check_array(X, accept_sparse, dtype, order, copy, >force_all_finite,
> 520 ensure_2d, allow_nd, ensure_min_samples,
>--> 521 ensure_min_features, warn_on_dtype, estimator)
> 522 if multi_output:
> 523 y = check_array(y, 'csr', force_all_finite=True, >ensure_2d=False,
>C:\Users\sayaji\Anaconda3\lib\site-packages\sklearn\utils\validation.py in >check_array(array, accept_sparse, dtype, order, copy, force_all_finite, >ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, >estimator)
> 414 " minimum of %d is required%s."
> 415 % (n_samples, shape_repr, >ensure_min_samples,
>--> 416 context))
> 417
> 418 if ensure_min_features > 0 and array.ndim == 2:
>ValueError: Found array with 0 sample(s) (shape=(0, 50)) while a minimum of 1 is >required.
如果你们都需要,这是我的完整代码
training_x,testing_x,training_y,testing_y = train_test_split(real_x,real_y,test_size=0.3,random_state=0)
Lin = LinearRegression()
Lin.fit(training_x,training_y)
real_x = data["R&D Spend"].values
real_y = data["State"].values
real_x = real_x.reshape(1,-1)
real_y = real_y.reshape(1,-1)
training_x,testing_x,training_y,testing_y = train_test_split(real_x,real_y,test_size=0.3,random_state=0)
Lin = LinearRegression()
Lin.fit(training_x,training_y)
real_x.shape
(1, 50)
real_x.shape
(1, 50)
training_x.shape
(0,50)
training_y.shape
(0,50)
我认为这是导致错误的原因,训练 x 和训练 y 不应该有 0 有什么我做错了吗?
【问题讨论】:
【参考方案1】:在重现您的案例条件并运行它后,我得到一个更明确的错误:
ValueError: With n_samples=1, test_size=0.3 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
这意味着它正在对第一个维度(在本例中为 1)上的数据进行拆分,这导致结果中的第一个维度为 0。
因此,只需重新调整 train_test_split 的输入,即可将 50 作为第一个维度。它解决了它。
例如real_x.reshape(50,1)
和real_y
一样
【讨论】:
【参考方案2】:当只涉及 1 个特征时,使用reshape(-1,1)
,数据应如下所示:
real_x = data["R&D Spend"].values
real_x = real_x.reshape(-1,1)
real_y = data["State"].values
real_y = real_y.reshape(-1,1)
training_x,testing_x,training_y,testing_y = train_test_split(real_x,real_y,test_size=0.3,random_state=0)
Lin = LinearRegression().fit(training_x,training_y)
此外,您通常希望代码更具可读性,因此请在不相关的变量或函数之间留出空格。
【讨论】:
以上是关于training.test.split 返回空的训练集的主要内容,如果未能解决你的问题,请参考以下文章