维度问题线性回归 Python scikit 学习

Posted 2023-03-12

技术标签:

【中文标题】维度问题线性回归 Python scikit 学习【英文标题】：Dimensions problem linear regression Python scikit learn 【发布时间】：2019-04-19 14:19:27 【问题描述】：

我正在实现一个函数，其中我必须使用 scikit learn 执行线性回归。

通过示例运行它时我所拥有的：

X_train.shape=(34,3)
X_test.shape=(12,3)
Y_train.shape=(34,1)
Y_test.shape=(12,1)

然后

lm.fit(X_train,Y_train)
Y_pred = lm.predict(X_test)

但是 Python 告诉我这一行有一个错误

 dico['R2 value']=lm.score(Y_test, Y_pred)

Python 告诉我什么：

 ValueError: shapes (12,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

提前感谢任何人都可以带给我的帮助:)

亚历克斯

【问题讨论】：

【参考方案1】：

要使用lm.score()，您需要传递X_test、y_test。

dico['R2 value']=lm.score(X_test, Y_test)

见documentation here：

score(X, y, sample_weight=None)

X : array-like, shape = (n_samples, n_features) Test samples. 
    For some estimators this may be a precomputed kernel matrix instead, 
    shape = (n_samples, n_samples_fitted], where n_samples_fitted is the 
    number of samples used in the fitting for the estimator.

y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X.

sample_weight : array-like, shape = [n_samples], optional Sample weights.

您试图将 score 方法用作度量方法，这是错误的。任何估计器上的score() 方法本身都会计算预测，然后将它们发送给适当的度量计分器。

如果你想自己使用Y_test和Y_pred，那么你可以这样做：

from sklearn.metrics import r2_score
dico['R2 value'] = r2_score(Y_test, Y_pred)

【讨论】：

非常感谢您的帮助！似乎我有点困惑 :) 但是现在我不明白为什么 r2 分数真的很低（0.11）而我使用的数据集是虹膜... @Alex Iris 是一个分类数据集，您使用的是回归模型（带有 R 平方的线性回归），因此无法正常工作。使用名称中包含 Classifier 的模型嗯，我不明白为什么，因为我只保留了 setosa 类型的虹膜，这样回归才会有意义。我的特征是 SepalLengthCm、SepalWidthCm、PetalLengthCm，我想预测 PetalWidthCm。那么为什么线性回归不合法呢？ @Alex 好吧，在这种情况下回归是有道理的。但是你需要考虑从其他特征预测花瓣宽度是否真的有意义。只有当因变量（在这种情况下为花瓣宽度）实际上是其他变量的因变量时，回归才会表现良好。我认为不是。最后一个问题：如果还有一些名义/有序特征，我还能使用 sklearn 中的 LinearRegression 吗？（显然我会在执行回归之前对它们进行编码）

以上是关于维度问题线性回归 Python scikit 学习的主要内容，如果未能解决你的问题，请参考以下文章