正则化——解决过拟合问题

Posted 2021-01-15 qkloveslife

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了正则化——解决过拟合问题相关的知识，希望对你有一定的参考价值。

线性回归例子

如果

[{h_ heta }left( x ight) = { heta _0} + { heta _1}x]

通过线性回归得到的曲线可能如下图

技术分享图片

这种情况下，曲线对数据的拟合程度不好。这种情况称为“Underfit”，这种情况属于“High bias”（高偏差）。

如果

[{h_ heta }left( x ight) = { heta _0} + { heta _1}x + { heta _2}{x^2}]

通过线性回归得到的曲线可能如下图

技术分享图片

这种情况下，曲线对数据的拟合程度就比较好，可以称为“Just right”。。。

如果

[{h_ heta }left( x ight) = { heta _0} + { heta _1}x + { heta _2}{x^2} + { heta _3}{x^3} + { heta _4}{x^4}]

通过线性回归得到的曲线可能如下图

技术分享图片

这种情况下虽然对现有数据有很好的拟合程度，但是对于新的数据预测是不合理的。这种情况称为“Overfit”，这种情况属于“High variance”（高方差）。

什么情况会出现过拟合呢？

if we have too many features, the learned hypothesis may fit the training set very well (J(θ)≈0), but fail to generalize to new examples (predict prices on new examples).

如果我们有太多的特征，学习的假设可能非常适合训练集（J(θ)≈0），但不能推广到新的例子（预测新例子的价格）。

在列举逻辑回归的例子

如果

[egin{array}{l}
{h_ heta }left( x ight) = gleft( {{ heta _0} + { heta _1}{x_1} + { heta _2}{x_2}} ight)\\
left( {g = sigmoid\\_function} ight)
end{array}]

技术分享图片

这种情况属于“Underfit”

如果

[{h_ heta }left( x ight) = gleft( {{ heta _0} + { heta _1}{x_1} + { heta _2}{x_2} + { heta _3}x_1^2 + { heta _4}x_2^2 + { heta _5}{x_1}{x_2}} ight)]

技术分享图片

这种情况属于“Just right”

如果

[{h_ heta }left( x ight) = gleft( {{ heta _0} + { heta _1}{x_1} + { heta _2}x_1^2 + { heta _3}x_1^2{x_2} + { heta _4}x_1^2x_2^2 + { heta _5}x_1^2x_2^3 + ...} ight)]

技术分享图片

这种情况属于“Overfit”

如何解决过拟合？

方案1：

Reduce number of features.

Manually select which features to keep.
Model selection algorithm

减少特征的数量

人工决定保留哪些特征
运用模型选择算法

存在的问题：去掉特征就意味着去掉“Information”，如果去掉了有用的信息就不好了

方案2：

Regularization

Keep all the features, but reduce magnitude/values of parameters θ_j

正规化

保留所有特征，但减少参数θ_j的幅度/值

优点：Works well when we have a lot of features, each of which contributes a bit to predicting y.

当我们有很多特征时效果很好，每个功能都有助于预测y。

正则化具体实现

对比以下两种情况

技术分享图片

[{h_ heta }left( x ight) = { heta _0} + { heta _1}x + { heta _2}{x^2}]

技术分享图片

[{h_ heta }left( x ight) = { heta _0} + { heta _1}x + { heta _2}{x^2} + { heta _3}{x^3} + { heta _4}{x^4}]

假设我们使θ₃，θ₄很小，那么x₃和x₄的影响就会很小，这时过拟合的公式结果就很接近好的情况

我们先假设使用如下公式

[Jleft( heta ight) = frac{1}{{2m}}sumlimits_{i = 1}^m {{{left( {{h_ heta }left( {{x^{left( i ight)}}} ight) - {y^{left( i ight)}}} ight)}^2}} + 1000 heta _3^2 + 1000 heta _4^2]

线性回归的目的是

[mathop {min }limits_ heta frac{1}{{2m}}sumlimits_{i = 1}^m {{{left( {{h_ heta }left( {{x^{left( i ight)}}} ight) - {y^{left( i ight)}}} ight)}^2}} + 1000 heta _3^2 + 1000 heta _4^2]

因为的θ₃，θ₄权重比较大，所以要想最小化这个方程，θ₃，θ₄必须很小，这就起到的减小θ₃，θ₄的效果。

在正则化方法中，我们不知道需要让那个参数更小，这时我们就让所有参数都小。

加上正则化部分后代价函数为

[left( heta ight) = frac{1}{{2m}}left[ {underbrace {sumlimits_{i = 1}^m {{{left( {{h_ heta }left( {{x^{left( i ight)}}} ight) - {y^{left( i ight)}}} ight)}^2}} }_{part1} + underbrace {lambda sumlimits_{j = 1}^n { heta _j^2} }_{part2}} ight]]

其中，λ称为正则化参数（regulariztion parameter）。这个函数有两个目标：第一个目标是让h(x)尽量接近y；第二个目标是让每个θ尽量小（或让代价函数尽量简单）。

为什么让所有参数都尽量小就可以达到让函数曲线更接近正确？

自己实现了就比较直观的了解了！！

λ的大小选择问题

对于

[{h_ heta }left( x ight) = { heta _0} + { heta _1}x + { heta _2}{x^2} + { heta _3}{x^3} + { heta _4}{x^4}]

如果 λ选的过大，所有的参数θ1，θ2，θ3，θ4都会很小，这样就会出现

[{h_ heta }left( x ight) approx { heta _0}]

这就会出现欠拟合（underfit）

如果λ选的过小，就起不到正则化的效果，这样就会出现

[{h_ heta }left( x ight) approx { heta _0} + { heta _1}x + { heta _2}{x^2} + { heta _3}{x^3} + { heta _4}{x^4}]

这样就解决不了过拟合（overfit）

以上是关于正则化——解决过拟合问题的主要内容，如果未能解决你的问题，请参考以下文章