梯度下降和正规方程没有给出相同的结果，为啥？

Posted 2023-03-12

技术标签:

【中文标题】梯度下降和正规方程没有给出相同的结果，为啥？【英文标题】：Gradient descent and normal equation not giving the same results, why?梯度下降和正规方程没有给出相同的结果，为什么？ 【发布时间】：2019-12-19 11:24:31 【问题描述】：

我正在编写一个简单的脚本，试图为我的假设找到值。我使用一个梯度下降和第二个正常方程。正规方程给了我正确的结果，但我的梯度下降没有。我无法用这么简单的案例弄清楚为什么不起作用。

您好，我想了解为什么我的梯度下降与线性回归的正规方程不匹配。我正在使用 matlab 来实现两者。这是我尝试过的：

所以我创建了一个虚拟训练集：

x = 1 2 3，y = 2 3 4

所以我的假设应该收敛到 theta = 1 1 所以我得到一个简单的

h(x) = 1 + x;

下面是比较正规方程和梯度下降的测试代码：

clear;
disp("gradient descend");
X = [1; 2; 3];
y = [2; 3; 4];
theta = [0 0];
num_iters = 10;
alpha = 0.3;
thetaOut = gradientDescent(X, y, theta, 0.3, 10); % GD -> does not work, why?
disp(thetaOut);

clear;
disp("normal equation");
X = [1 1; 1 2; 1 3];
y = [2;3;4];
Xt = transpose(X);
theta = pinv(Xt*X)*Xt*y; % normal equation -> works!
disp(theta);

这里是梯度下降的内循环：

samples = length(y);
for epoch = 1:iterations

     hipoth = X * theta;
     factor = alpha * (1/samples);
     theta = theta - factor * ((hipoth - y)' * X )';
     %disp(epoch);

end

以及 10 次迭代后的输出：

gradient descend = 1.4284 1.4284 - > wrong
normal equation = 1.0000 1.0000 -> correct

没有意义，应该收敛到1,1。

有什么想法吗？我有 matlab 语法问题吗？

谢谢！

【问题讨论】：

您发布的内容不起作用，请发布您的真实代码。 hipoth=X*theta 是 3x2，y 是 3x1，所以出现了第一个错误。 Matlab 字符串（在disp 中）也使用单引号'。 @avermaet MATLAB "strings" 与 MATLAB 'char arrays' 不同，双引号是有效的语法 since R2016b。您的 X*theta 注释更有效，但请注意，隐式扩展也是有效的语法 since R2016b，因此可能是 OP 只是打算使用逐元素乘数 .*。 @Wolfie 感谢您的澄清，我不知道这一点。完全是我的错。 【参考方案1】：

梯度下降可以解决很多不同的问题。你想做一个线性回归，即找到一个最适合你的数据的线性函数 h(x) = theta_1 * X + theta_2：

h(X) = Y + 误差

什么是“最佳”匹配是值得商榷的。定义最佳拟合的最常用方法是最小化拟合数据与实际数据之间的误差平方。假设这就是你想要的......

将函数替换为

function [theta] = gradientDescent(X, Y, theta, alpha, num_iters)
n = length(Y);
    for epoch = 1:num_iters

        Y_pred = theta(1)*X + theta(2);
        D_t1 = (-2/n) * X' * (Y - Y_pred);
        D_t2 = (-2/n) * sum(Y - Y_pred);
        theta(1) = theta(1) - alpha * D_t1;
        theta(2) = theta(2) - alpha * D_t2;

    end
end

并稍微改变你的参数，例如

num_iters = 10000;
alpha = 0.05;

你得到了正确的答案。我从here 获取了代码 sn-p，这也可能为了解此处实际发生的情况提供了一个很好的起点。

【讨论】：

【参考方案2】：

您的梯度下降解决的问题与正常方程不同，您输入的数据不同。最重要的是，您似乎过于复杂了 a 但 theta 更新，但这不是问题。代码中的微小更改会产生正确的输出：

function theta=gradientDescent(X,y,theta,alpha,iterations)

samples = length(y);
for epoch = 1:iterations

     hipoth = X * theta;
     factor = alpha * (1/samples);
     theta = theta - factor * X'*(hipoth - y);
     %disp(epoch);

end
end

及主要代码：

clear;
X = [1 1; 1 2; 1 3];
y = [2;3;4];
theta = [0 0];
num_iters = 10;
alpha = 0.3;
thetaOut = gradientDescent(X, y, theta', 0.3, 600); % Iterate a bit more, you impatient person!

theta = pinv(X.'*X)*X.'*y; % normal equation -> works!


disp("gradient descend");
disp(thetaOut);
disp("normal equation");
disp(theta);

【讨论】：

以上是关于梯度下降和正规方程没有给出相同的结果，为啥？的主要内容，如果未能解决你的问题，请参考以下文章