Python Sklearn 线性回归产生不正确的系数值

Posted

技术标签:

【中文标题】Python Sklearn 线性回归产生不正确的系数值【英文标题】:Python Sklearn Linear Regression Yields Incorrect Coefficient Values 【发布时间】:2021-06-19 11:04:08 【问题描述】:

我正在尝试查找线性方程的斜率和 y 截距系数。我创建了一个测试域和范围,以确保我收到的数字是正确的。方程应该是 y = 2x + 1,但模型说斜率为 24,y 截距为 40.3125。该模型准确地预测了我给它的每个值,但我质疑如何获得正确的值。

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = np.arange(0, 40)
y = (2 * X) + 1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=0)
X_train = [[i] for i in X_train]
X_test = [[i] for i in X_test]

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

regr = linear_model.LinearRegression()

regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

print('Coefficients: \n', regr.coef_)
print('Y-intercept: \n', regr.intercept_)
print('Mean squared error: %.2f'
      % mean_squared_error(y_test, y_pred))
print('Coefficient of determination: %.2f'
      % r2_score(y_test, y_pred))

plt.scatter(X_test, y_test,  color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
print(X_test)

plt.xticks()
plt.yticks()

plt.show()

【问题讨论】:

【参考方案1】:

发生这种情况是因为您扩展了训练和测试数据。因此,即使您将y 生成为X 的线性函数,您也可以通过标准化将X_trainX_test 转换为另一个尺度(减去均值并除以标准差)。

如果我们运行您的代码但省略了您缩放数据的行,您将获得预期的结果。

X = np.arange(0, 40)
y = (2 * X) + 1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=0)
X_train = [[i] for i in X_train]
X_test = [[i] for i in X_test]

# Skip the scaling of X_train and X_test
#sc = StandardScaler()
#X_train = sc.fit_transform(X_train)
#X_test = sc.transform(X_test)

regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

print('Coefficients: \n', regr.coef_)
> Coefficients: 
   [2.]
print('Y-intercept: \n', regr.intercept_)
> Y-intercept: 
   1.0

【讨论】:

以上是关于Python Sklearn 线性回归产生不正确的系数值的主要内容,如果未能解决你的问题,请参考以下文章

sklearn实现一元线性回归 Python机器学习系列

Python数模笔记-Sklearn线性回归

Python使用sklearn构建广义线性模型:泊松回归(Poisson regression)实战

Python Sklearn 逻辑回归模型拟合不正确

Python使用sklearn构建广义线性模型:Tweedie回归(Tweedie regression)实战

Python使用sklearn构建广义线性模型:gamma回归(Gamma regression)实战