梯度下降算法在 Haskell 中不收敛

Posted

技术标签:

【中文标题】梯度下降算法在 Haskell 中不收敛【英文标题】:Gradient Descent algorithm not converging in Haskell 【发布时间】:2017-03-24 12:06:38 【问题描述】:

我正在尝试在 Andrew Ng 的 ML 课程中实现梯度下降算法。读入数据后,我尝试执行以下操作,将我的 theta 值列表更新 1000 次,期望有一些收敛。

有问题的算法是gradientDescent。我知道通常导致此问题的原因是 alpha 可能太大,但是当我将 alpha 更改为 n 时,我的结果会更改为 n 的因数。当我将iterations 更改为n 时,也会发生同样的情况。我想说这可能与 Haskell 的懒惰有关,但我完全不确定。任何帮助将不胜感激。

module LR1V where

import qualified Data.Matrix as M
import System.IO
import Data.List.Split
import qualified Data.Vector as V

main :: IO ()
main = do
    contents <- getContents
    let lns = lines contents :: [String]
        entries = map (splitOn ",") lns :: [[String]]
        mbPoints = mapM readPoints entries :: Maybe [[Double]]
    case mbPoints of 
        Just points -> runData points
        _           -> putStrLn "Error: it is possible the file is incorrectly formatted"

readPoints :: [String] -> Maybe [Double]
readPoints dat@(x:y:_) = return $ map read dat
readPoints _ = Nothing

runData :: [[Double]] -> IO ()
runData pts = do
    let (mxs,ys) = runPoints pts
        c = M.ncols mxs
        m = M.nrows mxs
        thetas = M.zero 1 (M.ncols mxs)
        alpha = 0.01
        iterations = 1000
        results = gradientDescent mxs ys thetas alpha m c iterations
    print results

runPoints :: [[Double]] -> (M.Matrix Double, [Double])
runPoints pts = (xs, ys) where
    xs = M.fromLists $ addX0 $ map init pts
    ys = map last pts

-- X0 will always be 1
addX0 :: [[Double]] -> [[Double]]
addX0 = map (1.0 :)

-- theta is 1xn and x is nx1, where n is the amount of features
-- so it is safe to assume a scalar results from the multiplication
hypothesis :: M.Matrix Double -> M.Matrix Double -> Double
hypothesis thetas x = 
    M.getElem 1 1 (M.multStd thetas x)

gradientDescent :: M.Matrix Double
                   -> [Double] 
                   -> M.Matrix Double
                   -> Double 
                   -> Int 
                   -> Int 
                   -> Int
                   -> [Double]
gradientDescent mxs ys thetas alpha m n it = 
    let x i = M.colVector $ M.getRow i mxs
        y i = ys !! (i-1)
        h i = hypothesis thetas (x i)
        thL = zip [1..] $ M.toList thetas :: [(Int, Double)]
        z i j = ((h i) - (y i))*(M.getElem i j $ mxs)
        sumSquares j = sum [z i j | i <- [1..m]]
        thetaJ t j = t - ((alpha * (1/ (fromIntegral m))) * (sumSquares j))
        result = map snd $ foldl (\ts _ -> [(j,thetaJ t j) | (j,t) <- ts]) thL [1..it] in
    result

还有数据……

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705

alpha0.01 时,我的 thetas 评估为 [58.39135051546406,653.2884974555699]。当alpha0.001 时,我的值变为[5.839135051546473,65.32884974555617]。当 iterations 更改为 10,000 时,我的值将恢复到之前的值。

【问题讨论】:

试试更简单的示例数据集怎么样? 我会用一组清晰的线性拟合来试一试@leftaroundabout 【参考方案1】:

似乎每次更新 theta 值时,近似函数 h(x) 每次都使用初始 theta 向量,而不是更新后的向量。现在,我得到了我的 theta 值的近似值。但是,将迭代次数增加一个很大的倍数会以一种奇怪的方式改变我的结果。

【讨论】:

您可以编辑您的问题以准确说明您更改了什么以及如何更改,并解释当前行为的奇怪之处。

以上是关于梯度下降算法在 Haskell 中不收敛的主要内容,如果未能解决你的问题,请参考以下文章

为啥随机梯度下降方法能够收敛

线性回归上的梯度下降不收敛

梯度下降的直觉

机器学习:梯度下降算法原理讲解

随机梯度下降未能在我的神经网络实现中收敛

优化算法1-梯度下降