Caffe学习3-Solver

Posted LiemZuvon

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Caffe学习3-Solver相关的知识,希望对你有一定的参考价值。

Solver

Caffe中的Solver是一个框架,在这个框架里,我们并没有控制权,我们能做的有填充solver需要的代码。
并且对于没有接触过Caffe的读者,强烈建议先阅读小编的上一篇基础内容,然后再看下面的内容会比较好理解。

Solver过程

Solver的过程大致如下:

  • 初始化solver,初始化nets(这里用nets主要是因为solver还包含了validation的过程,因此除了单个training net以外,还有多个test nets。还记得上一节我们将初始化net的时候会有什么操作吗?忘了就回去看下吧~)
  • 迭代地调用forward与backward方法并更新参数(主要是权重更新,每次迭代,Solver会把梯度保存在blob的diff中,然后再当轮迭代结束后利用更新策略将diff并入到data中)
  • 周期性的使用test nets评估
  • 利用snapshot保存模型和solver的状态


Solver方法选择

机器学习过程需要训练出模型来,而这个模型其实就是权重。因此solver是一个自动化的寻找权重的方法。
Caffe的更新权重的方法如下:

  • Stochastic Gradient Descent (type: “SGD”),
  • AdaDelta (type: “AdaDelta”),
  • Adaptive Gradient (type: “AdaGrad”),
  • Adam (type: “Adam”),
  • Nesterov’s Accelerated Gradient (type: “Nesterov”) and
  • RMSprop (type: “RMSProp”)

上面的所有方法以及解释和python实现均在小编的另一篇blog里面有提到,下面给出链接,方便大家查阅:
http://blog.csdn.net/u012767526/article/details/51407443

Solver初始化

Solver的初始话会准备好一切需要的参数并初始化Caffe模型,下面是官网提供的初始化信息:
Solver准备参数

> caffe train -solver examples/mnist/lenet_solver.prototxt
I0902 13:35:56.474978 16020 caffe.cpp:90] Starting Optimization
I0902 13:35:56.475190 16020 solver.cpp:32] Initializing solver from parameters:
test_iter: 100
test_interval: 500
base_lr: 0.01
display: 100
max_iter: 10000
lr_policy: "inv"
gamma: 0.0001
power: 0.75
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
solver_mode: GPU
net: "examples/mnist/lenet_train_test.prototxt"

初始化Caffe模型,相信大家已经很熟悉下面的过程了

I0902 13:35:56.655681 16020 solver.cpp:72] Creating training net from net file: examples/mnist/lenet_train_test.prototxt
[...]
I0902 13:35:56.656740 16020 net.cpp:56] Memory required for data: 0
I0902 13:35:56.656791 16020 net.cpp:67] Creating Layer mnist
I0902 13:35:56.656811 16020 net.cpp:356] mnist -> data
I0902 13:35:56.656846 16020 net.cpp:356] mnist -> label
I0902 13:35:56.656874 16020 net.cpp:96] Setting up mnist
I0902 13:35:56.694052 16020 data_layer.cpp:135] Opening lmdb examples/mnist/mnist_train_lmdb
I0902 13:35:56.701062 16020 data_layer.cpp:195] output data size: 64,1,28,28
I0902 13:35:56.701146 16020 data_layer.cpp:236] Initializing prefetch
I0902 13:35:56.701196 16020 data_layer.cpp:238] Prefetch initialized.
I0902 13:35:56.701212 16020 net.cpp:103] Top shape: 64 1 28 28 (50176)
I0902 13:35:56.701230 16020 net.cpp:103] Top shape: 64 1 1 1 (64)
[...]
I0902 13:35:56.703737 16020 net.cpp:67] Creating Layer ip1
I0902 13:35:56.703753 16020 net.cpp:394] ip1 <- pool2
I0902 13:35:56.703778 16020 net.cpp:356] ip1 -> ip1
I0902 13:35:56.703797 16020 net.cpp:96] Setting up ip1
I0902 13:35:56.728127 16020 net.cpp:103] Top shape: 64 500 1 1 (32000)
I0902 13:35:56.728142 16020 net.cpp:113] Memory required for data: 5039360
I0902 13:35:56.728175 16020 net.cpp:67] Creating Layer relu1
I0902 13:35:56.728194 16020 net.cpp:394] relu1 <- ip1
I0902 13:35:56.728219 16020 net.cpp:345] relu1 -> ip1 (in-place)
I0902 13:35:56.728240 16020 net.cpp:96] Setting up relu1
I0902 13:35:56.728256 16020 net.cpp:103] Top shape: 64 500 1 1 (32000)
I0902 13:35:56.728270 16020 net.cpp:113] Memory required for data: 5167360
I0902 13:35:56.728287 16020 net.cpp:67] Creating Layer ip2
I0902 13:35:56.728304 16020 net.cpp:394] ip2 <- ip1
I0902 13:35:56.728333 16020 net.cpp:356] ip2 -> ip2
I0902 13:35:56.728356 16020 net.cpp:96] Setting up ip2
I0902 13:35:56.728690 16020 net.cpp:103] Top shape: 64 10 1 1 (640)
I0902 13:35:56.728705 16020 net.cpp:113] Memory required for data: 5169920
I0902 13:35:56.728734 16020 net.cpp:67] Creating Layer loss
I0902 13:35:56.728747 16020 net.cpp:394] loss <- ip2
I0902 13:35:56.728767 16020 net.cpp:394] loss <- label
I0902 13:35:56.728786 16020 net.cpp:356] loss -> loss
I0902 13:35:56.728811 16020 net.cpp:96] Setting up loss
I0902 13:35:56.728837 16020 net.cpp:103] Top shape: 1 1 1 1 (1)
I0902 13:35:56.728849 16020 net.cpp:109]     with loss weight 1
I0902 13:35:56.728878 16020 net.cpp:113] Memory required for data: 5169924

Loss初始化

I0902 13:35:56.728893 16020 net.cpp:170] loss needs backward computation.
I0902 13:35:56.728909 16020 net.cpp:170] ip2 needs backward computation.
I0902 13:35:56.728924 16020 net.cpp:170] relu1 needs backward computation.
I0902 13:35:56.728938 16020 net.cpp:170] ip1 needs backward computation.
I0902 13:35:56.728953 16020 net.cpp:170] pool2 needs backward computation.
I0902 13:35:56.728970 16020 net.cpp:170] conv2 needs backward computation.
I0902 13:35:56.728984 16020 net.cpp:170] pool1 needs backward computation.
I0902 13:35:56.728998 16020 net.cpp:170] conv1 needs backward computation.
I0902 13:35:56.729014 16020 net.cpp:172] mnist does not need backward computation.
I0902 13:35:56.729027 16020 net.cpp:208] This network produces output loss
I0902 13:35:56.729053 16020 net.cpp:467] Collecting Learning Rate and Weight Decay.
I0902 13:35:56.729071 16020 net.cpp:219] Network initialization done.
I0902 13:35:56.729085 16020 net.cpp:220] Memory required for data: 5169924
I0902 13:35:56.729277 16020 solver.cpp:156] Creating test net (#0) specified by net file: examples/mnist/lenet_train_test.prototxt

初始化结束

I0902 13:35:56.806970 16020 solver.cpp:46] Solver scaffolding done.
I0902 13:35:56.806984 16020 solver.cpp:165] Solving LeNet

Snapshotting与Resuming

Solver可以通过调用Solver::Snapshot()和Solver::SnapshotSolverState()来保存当前的Solver和Model的状态。并且能够Solver::Restore()与Solver::RestoreSolverState()恢复。

如果需要使用这个功能,我们还需在solver的protobuf里定义一些参数。

# The snapshot interval in iterations.
snapshot: 5000
# File path prefix for snapshotting model weights and solver state.
# Note: this is relative to the invocation of the `caffe` utility, not the
# solver definition file.
snapshot_prefix: "/path/to/model"
# Snapshot the diff along with the weights. This can help debugging training
# but takes more storage.
snapshot_diff: false
# A final snapshot is saved at the end of training unless
# this flag is set to false. The default is true.
snapshot_after_train: true

以上是关于Caffe学习3-Solver的主要内容,如果未能解决你的问题,请参考以下文章

深度学习—caffe框架训练文档

caffe学习:多平台下安装配置caffe

Caffe 学习笔记目录

caffe学习------配置

Caffe 学习系列

Caffe学习和总结1