Machine Learning Week 1

Posted 2020-10-21 Barry

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Machine Learning Week 1相关的知识，希望对你有一定的参考价值。

1.Gradient Descent

2.Normal euqation

先介绍下什么是normal equation，如果一个数据集X有m个样本，n个特征。则如果函数为： $技术分享图片$ 。数据集X的特征向量表示为：

$技术分享图片$ 表示第i个训练样本， $技术分享图片$ 表示第i个训练样本的第j个特征。之所以在X中加了第一列全为1，是为了让 $技术分享图片$

若希望如果函数可以拟合Y，则 $技术分享图片$ 。又由于 $技术分享图片$ ，所以可以通过矩阵运算求出參数 $技术分享图片$ 。

熟悉线性代数的同学应该知道怎么求出參数 $技术分享图片$ 。可是前提是矩阵X存在逆矩阵 $技术分享图片$ 。

但仅仅有方阵才有可能存在逆矩阵（不熟悉定理的同学建议去补补线性代数），因此能够通过左乘 $技术分享图片$ 使等式变成 $技术分享图片$ ，因此 $技术分享图片$ ,有同学可能会有疑问 $技术分享图片$ 不一定存在啊，确实是，可是 $技术分享图片$ 极少不存在，后面会介绍 $技术分享图片$ 不存在的处理方法，先别着急。如今你仅仅须要明确为什么 $技术分享图片$ 就能够了。而且记住。

介绍完normal equation求解參数 $技术分享图片$ ，我们已经知道了两种求解參数 $技术分享图片$ 的方法。normal equation和梯度下降。如今来对照下这两种方法的优缺点以及什么场景选择什么方法。

详细见下表吧：

回到上面说的 $技术分享图片$ 不一定存在，这样的情况是极少存在的。假设 $技术分享图片$ 不可逆了，一般要考虑一下两者情况：

（1）移除冗余特征。一些特征存在线性依赖。

（2）特征太多时，要删除一些特征。比如（m<n)，对于小样本数据使用正则化。

Gradient Descent For Linear Regression

Note: [At 6:15 "h(x) = -900 - 0.1x" should be "h(x) = 900 - 0.1x"]

When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived. We can substitute our actual cost function and our actual hypothesis function and modify the equation to :

where m is the size of the training set,

Note that we have separated out the two cases for

The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.

So, this is simply gradient descent on the original cost function J. This method looks at every example in the entire training set on every step, and is called batch gradient descent. Note that, while gradient descent can be susceptible to local minima in general, the optimization problem we have posed here for linear regression has only one global, and no other local, optima; thus gradient descent always converges (assuming the learning rate α is not too large) to the global minimum. Indeed, J is a convex quadratic function. Here is an example of gradient descent as it is run to minimize a quadratic function.

The ellipses shown above are the contours of a quadratic function. Also shown is the trajectory taken by gradient descent, which was initialized at (48,30). The x’s in the figure (joined by straight lines) mark the successive values of θ that gradient descent went through as it converged to its minimum.

以上是关于Machine Learning Week 1的主要内容，如果未能解决你的问题，请参考以下文章