04 Regularization

Posted qq-1615160629

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了04 Regularization相关的知识,希望对你有一定的参考价值。

Regularization

for Linear Regression and Logistic Regression

Define

  1. under-fitting 欠拟合(high bias)
  2. over-fitting 过拟合 (high variance):have too many features, fail to generalize(泛化) to new examples.

Addressing over-fitting

  1. Reduce number of features.
    • Manually select which features to keep.
    • Model selection algorithm.
  2. Regularization
    • Keep all the features. but reduce magnitude/values of parameters \(\theta_j\).
    • Works well when we have a lot of features, each of whitch contributes a bit to predicting \(y\).

Regularized Cost Function

  • \[min_\theta \dfrac12m \sum_i=1^m (h_\theta(x^(i)) - y^(i))^2 + \lambda \sum_j=1^n \theta_j^2\]

Regularized Linear Regression

  1. Gradient Descent
    \[
    \beginalign* & \textRepeat\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac1m\ \sum_i=1^m (h_\theta(x^(i)) - y^(i))x_0^(i) \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac1m\ \sum_i=1^m (h_\theta(x^(i)) - y^(i))x_j^(i) \right) + \frac\lambdam\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \endalign*
    \]

    • 等价于
      \[
      \theta_j := \theta_j(1 - \alpha\frac\lambdam) - \alpha\frac1m \sum_i=1^m(h_\theta(x^(i)) - y^(i))x_j^(i)
      \]
  2. Normal Equation
    \[
    \beginalign*& \theta = \left( X^TX + \lambda \cdot L \right)^-1 X^Ty \newline& \textwhere\ \ L = \beginbmatrix 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\endbmatrix\endalign*
    \]

  • 对于不可逆的\((X^TX)\), \((X^TX + \lambda.L)\) 会可逆

Regularized Logistic Regression

  1. Cost Function

\[
J(\theta) = -\frac1m \sum_i=1^m[y^(i)log(h_\theta(x^(i))) + (1 - y^(i))log(1 - h_\theta(x^(i)))] + \frac\lambda2m\sum_j=1^n\theta_j^2
\]

  1. Gradient descent
    \[
    \beginalign* & \textRepeat\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac1m\ \sum_i=1^m (h_\theta(x^(i)) - y^(i))x_0^(i) \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac1m\ \sum_i=1^m (h_\theta(x^(i)) - y^(i))x_j^(i) \right) + \frac\lambdam\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n \rbrace \newline & \rbrace \endalign*
    \]

以上是关于04 Regularization的主要内容,如果未能解决你的问题,请参考以下文章

tensorflow:神经网络优化(ema,regularization)

Improving DNNs Hyperparameter tuning-Regularization and Optimization(week2)Regularization

正则化方法:L1和L2 regularization数据集扩增dropout

学习笔记 | NIPS 2021 regularization cocktail 调优的 MLPs 在表格数据上优于 GBDTs | Regularization is All Your Need

L2 Regularization for Neural Nerworks

加州理工学院公开课:机器学习与数据挖掘_Regularization(第十二课)