spark.mllib源码阅读-优化算法2-Updater
Posted 大愚若智_
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark.mllib源码阅读-优化算法2-Updater相关的知识,希望对你有一定的参考价值。
Updater是Spark中进行机器学习时对用于更新参数的轮子,参数更新的过程是
1、第i轮的机器学习求解得到的参数wi
2、第i+1轮计算得到的梯度值
3、正则化选项
来计算第i+1轮的机器学习要求解的参数wi+1
Spark实现了三类Updater,SimpleUpdater、L1Updater及SquaredL2Updater,他们之间关系为
SimpleUpdater:
无正则化的Updater,直接基于梯度值来更新参数wi+1=wi - gradient*rate
实现代码如下:
class SimpleUpdater extends Updater
override def compute(
weightsOld: Vector,
gradient: Vector,
stepSize: Double,
iter: Int,
regParam: Double): (Vector, Double) =
val thisIterStepSize = stepSize / math.sqrt(iter) //计算rate
val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights) //brzWeights - thisIterStepSize*gradient
(Vectors.fromBreeze(brzWeights), 0)
L1Updater:
L1正则化的Updater,在参数更新的时候,对参数值的范围做了一定的限制,即使w更稀疏至于L1正则化的过程,原始文档讲的很详细
* If w(参数,下同) is greater than shrinkageVal(通过一个基于当前迭代次数和正则化因子的计算的值), set weight component to w-shrinkageVal. * If w is less than -shrinkageVal, set weight component to w+shrinkageVal. * If w is (-shrinkageVal, shrinkageVal), set weight component to 0.
实现代码也很简单:
class L1Updater extends Updater
override def compute(
weightsOld: Vector,
gradient: Vector,
stepSize: Double,
iter: Int,
regParam: Double): (Vector, Double) =
val thisIterStepSize = stepSize / math.sqrt(iter)
// Take gradient step
val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights)
// Apply proximal operator (soft thresholding)
val shrinkageVal = regParam * thisIterStepSize
var i = 0
val len = brzWeights.length
while (i < len)
val wi = brzWeights(i)
brzWeights(i) = signum(wi) * max(0.0, abs(wi) - shrinkageVal)
i += 1
(Vectors.fromBreeze(brzWeights), brzNorm(brzWeights, 1.0) * regParam)
L2正则化Updater:
L2正则化Updater,在原有损失函数的基础上加上1/2 ||w||^2,对正则化之后的损失函数求梯度得到参数的更新
w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
class SquaredL2Updater extends Updater
override def compute(
weightsOld: Vector,
gradient: Vector,
stepSize: Double,
iter: Int,
regParam: Double): (Vector, Double) =
val thisIterStepSize = stepSize / math.sqrt(iter)
val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
brzWeights :*= (1.0 - thisIterStepSize * regParam)
brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights)
val norm = brzNorm(brzWeights, 2.0)
(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
在定制自己的正则化方式时,可以继承抽象基类Updater并实现参数更新compute方法
以上是关于spark.mllib源码阅读-优化算法2-Updater的主要内容,如果未能解决你的问题,请参考以下文章
spark.mllib源码阅读-优化算法1-Gradient
spark.mllib源码阅读-分类算法4-DecisionTree
spark.mllib源码阅读-分类算法4-DecisionTree
spark.mllib源码阅读-回归算法2-IsotonicRegression