线性回归之决定系数(coefficient of determination)

Posted guoxiang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了线性回归之决定系数(coefficient of determination)相关的知识,希望对你有一定的参考价值。

1. Sum Of Squares Due To Error 
技术分享
对于第i个观察点, 真实数据的Yi与估算出来的Yi-head的之间的差称为第i个residual, SSE 就是所有观察点的residual的和
2. Total Sum Of Squares
技术分享

3. Sum Of Squares Due To Regression
技术分享

 

通过以上我们能得到以下关于他们三者的关系

技术分享

 

决定系数: 判断 回归方程 的拟合程度


(coefficient of determination)决定系数也就是说: 通过回归方程得出的 dependent variable 有 number% 能被 independent variable 所解释. 判断拟合的程度
技术分享

(Correlation coefficient) 相关系数 : 测试dependent variable 和 independent variable 他们之间的线性关系有多强. 也就是说, independent variable 产生变化时 dependent variable 的变化有多大.

可以反映是正相关还是负相关

技术分享

技术分享

参考链接:http://blog.csdn.net/ytdxyhz/article/details/51730995

 

注意此决定系数不能用来衡量非线性回归的拟合优度

Why Is It Impossible to Calculate a Valid R-squared for Nonlinear Regression?

R-squared is based on the underlying assumption that you are fitting a linear model. If you aren’t fitting a linear model, you shouldn’t use it. The reason why is actually very easy to understand.

For linear models, the sums of the squared errors always add up in a specific manner: SS Regression + SS Error = SS Total.

This seems quite logical. The variance that the regression model accounts for plus the error variance adds up to equal the total variance. Further, R-squared equals SS Regression / SS Total, which mathematically must produce a value between 0 and 100%.

In nonlinear regression, SS Regression + SS Error do not equal SS Total! This completely invalidates R-squared for nonlinear models, and it no longer has to be between 0 and 100%.

参考链接:http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression














以上是关于线性回归之决定系数(coefficient of determination)的主要内容,如果未能解决你的问题,请参考以下文章

机器学习-回归中的相关性(Correlation Coefficient)和R平方值算法

EXCEL线性回归中 df , significanceF , F , coefficients , Standard Error , t Stat , P-value 啥意思

R语言之Logistic回归分析

R语言glmnet拟合岭回归模型实战:岭回归模型的模型系数(ridge regression coefficients)及可视化岭回归模型分类评估计算(混淆矩阵accuracyDeviance)

权重计算方法三:变异系数法(Coefficient of Variation)

构建线性模型(Lasso)并通过系数(coefficients)可视化分析特征重要度