咖啡损失没有减少
Posted
技术标签:
【中文标题】咖啡损失没有减少【英文标题】:Caffe Loss Doesn't Decrease 【发布时间】:2018-02-09 22:12:57 【问题描述】:我是 caffe 的新用户,我基本上对 FCN 模型进行了一些小修改,以根据我自己的数据进行训练。我注意到在 680 次迭代之后,损失没有改变。我想可能是因为我在像素上应用了 1/255 的比例,但我已经删除了它并且没有任何变化。
我的数据在 LMDB 中(1 个 LMDB 用于训练图像,1 个 LMDB 用于训练标签,1 个用于验证,1 个用于验证标签),标签是 0 和 1 存储为 uint8。
有人有什么建议吗?
I0830 23:05:45.645638 2989601728 solver.cpp:218] Iteration 0 (0 iter/s, 74.062s/20 iters), loss = 190732
I0830 23:05:45.647449 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:05:45.647469 2989601728 sgd_solver.cpp:105] Iteration 0, lr = 1e-14
I0830 23:28:42.183948 2989601728 solver.cpp:218] Iteration 20 (0.0145293 iter/s, 1376.53s/20 iters), loss = 190732
I0830 23:28:42.185940 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:28:42.185962 2989601728 sgd_solver.cpp:105] Iteration 20, lr = 1e-14
I0830 23:51:43.803419 2989601728 solver.cpp:218] Iteration 40 (0.0144758 iter/s, 1381.62s/20 iters), loss = 190732
I0830 23:51:43.817291 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0830 23:51:43.817371 2989601728 sgd_solver.cpp:105] Iteration 40, lr = 1e-14
I0831 00:17:23.955076 2989601728 solver.cpp:218] Iteration 60 (0.0129858 iter/s, 1540.14s/20 iters), loss = 190732
I0831 00:17:23.957161 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 00:17:23.957203 2989601728 sgd_solver.cpp:105] Iteration 60, lr = 1e-14
I0831 00:40:41.079898 2989601728 solver.cpp:218] Iteration 80 (0.0143152 iter/s, 1397.12s/20 iters), loss = 190732
I0831 00:40:41.082603 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 00:40:41.082649 2989601728 sgd_solver.cpp:105] Iteration 80, lr = 1e-14
I0831 01:03:53.159317 2989601728 solver.cpp:218] Iteration 100 (0.014367 iter/s, 1392.08s/20 iters), loss = 190732
I0831 01:03:53.161844 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:03:53.161903 2989601728 sgd_solver.cpp:105] Iteration 100, lr = 1e-14
I0831 01:27:03.867575 2989601728 solver.cpp:218] Iteration 120 (0.0143812 iter/s, 1390.71s/20 iters), loss = 190732
I0831 01:27:03.869439 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:27:03.869469 2989601728 sgd_solver.cpp:105] Iteration 120, lr = 1e-14
I0831 01:50:10.512094 2989601728 solver.cpp:218] Iteration 140 (0.0144233 iter/s, 1386.64s/20 iters), loss = 190732
I0831 01:50:10.514268 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 01:50:10.514302 2989601728 sgd_solver.cpp:105] Iteration 140, lr = 1e-14
I0831 02:09:50.607455 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 02:09:50.672649 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 02:13:16.209158 2989601728 solver.cpp:218] Iteration 160 (0.0144332 iter/s, 1385.69s/20 iters), loss = 190732
I0831 02:13:16.211565 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:13:16.211609 2989601728 sgd_solver.cpp:105] Iteration 160, lr = 1e-14
I0831 02:36:30.536650 2989601728 solver.cpp:218] Iteration 180 (0.0143439 iter/s, 1394.32s/20 iters), loss = 190732
I0831 02:36:30.538833 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:36:30.539871 2989601728 sgd_solver.cpp:105] Iteration 180, lr = 1e-14
I0831 02:59:38.813151 2989601728 solver.cpp:218] Iteration 200 (0.0144064 iter/s, 1388.27s/20 iters), loss = 190732
I0831 02:59:38.814018 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 02:59:38.814097 2989601728 sgd_solver.cpp:105] Iteration 200, lr = 1e-14
I0831 03:22:46.534659 2989601728 solver.cpp:218] Iteration 220 (0.0144121 iter/s, 1387.72s/20 iters), loss = 190732
I0831 03:22:46.536751 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 03:22:46.536808 2989601728 sgd_solver.cpp:105] Iteration 220, lr = 1e-14
I0831 03:46:38.997651 2989601728 solver.cpp:218] Iteration 240 (0.013962 iter/s, 1432.46s/20 iters), loss = 190732
I0831 03:46:39.001502 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 03:46:39.001591 2989601728 sgd_solver.cpp:105] Iteration 240, lr = 1e-14
I0831 04:09:49.981889 2989601728 solver.cpp:218] Iteration 260 (0.0143784 iter/s, 1390.98s/20 iters), loss = 190732
I0831 04:09:49.983256 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:09:49.983301 2989601728 sgd_solver.cpp:105] Iteration 260, lr = 1e-14
I0831 04:32:59.845221 2989601728 solver.cpp:218] Iteration 280 (0.0143899 iter/s, 1389.86s/20 iters), loss = 190732
I0831 04:32:59.847712 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:32:59.847936 2989601728 sgd_solver.cpp:105] Iteration 280, lr = 1e-14
I0831 04:56:07.752025 2989601728 solver.cpp:218] Iteration 300 (0.0144102 iter/s, 1387.9s/20 iters), loss = 190732
I0831 04:56:07.754050 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 04:56:07.754091 2989601728 sgd_solver.cpp:105] Iteration 300, lr = 1e-14
I0831 05:16:57.383947 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 05:16:57.468634 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 05:19:16.101671 2989601728 solver.cpp:218] Iteration 320 (0.0144056 iter/s, 1388.35s/20 iters), loss = 190732
I0831 05:19:16.102998 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 05:19:16.103953 2989601728 sgd_solver.cpp:105] Iteration 320, lr = 1e-14
I0831 05:42:22.554265 2989601728 solver.cpp:218] Iteration 340 (0.0144253 iter/s, 1386.45s/20 iters), loss = 190732
I0831 05:42:22.557201 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 05:42:22.558081 2989601728 sgd_solver.cpp:105] Iteration 340, lr = 1e-14
I0831 06:05:33.816596 2989601728 solver.cpp:218] Iteration 360 (0.0143755 iter/s, 1391.26s/20 iters), loss = 190732
I0831 06:05:33.819310 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:05:33.819358 2989601728 sgd_solver.cpp:105] Iteration 360, lr = 1e-14
I0831 06:28:38.358750 2989601728 solver.cpp:218] Iteration 380 (0.0144452 iter/s, 1384.54s/20 iters), loss = 190732
I0831 06:28:38.362834 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:28:38.363451 2989601728 sgd_solver.cpp:105] Iteration 380, lr = 1e-14
I0831 06:51:48.489392 2989601728 solver.cpp:218] Iteration 400 (0.0143872 iter/s, 1390.13s/20 iters), loss = 190732
I0831 06:51:48.490061 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 06:51:48.491013 2989601728 sgd_solver.cpp:105] Iteration 400, lr = 1e-14
I0831 07:15:00.156152 2989601728 solver.cpp:218] Iteration 420 (0.0143713 iter/s, 1391.67s/20 iters), loss = 190732
I0831 07:15:00.159214 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 07:15:00.159261 2989601728 sgd_solver.cpp:105] Iteration 420, lr = 1e-14
I0831 07:38:09.862089 2989601728 solver.cpp:218] Iteration 440 (0.0143916 iter/s, 1389.7s/20 iters), loss = 190732
I0831 07:38:09.865105 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 07:38:09.865152 2989601728 sgd_solver.cpp:105] Iteration 440, lr = 1e-14
I0831 08:01:15.438222 2989601728 solver.cpp:218] Iteration 460 (0.0144345 iter/s, 1385.57s/20 iters), loss = 190732
I0831 08:01:15.439589 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:01:15.440675 2989601728 sgd_solver.cpp:105] Iteration 460, lr = 1e-14
I0831 08:24:24.188830 2989601728 solver.cpp:218] Iteration 480 (0.0144015 iter/s, 1388.75s/20 iters), loss = 190732
I0831 08:24:24.191907 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:24:24.191951 2989601728 sgd_solver.cpp:105] Iteration 480, lr = 1e-14
I0831 08:24:24.514991 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 08:24:24.524113 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 08:47:29.558264 2989601728 solver.cpp:218] Iteration 500 (0.0144366 iter/s, 1385.37s/20 iters), loss = 190732
I0831 08:47:29.562070 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 08:47:29.562104 2989601728 sgd_solver.cpp:105] Iteration 500, lr = 1e-14
I0831 09:10:43.430681 2989601728 solver.cpp:218] Iteration 520 (0.0143486 iter/s, 1393.87s/20 iters), loss = 190732
I0831 09:10:43.432601 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:10:43.433498 2989601728 sgd_solver.cpp:105] Iteration 520, lr = 1e-14
I0831 09:33:53.022397 2989601728 solver.cpp:218] Iteration 540 (0.0143927 iter/s, 1389.59s/20 iters), loss = 190732
I0831 09:33:53.024354 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:33:53.024405 2989601728 sgd_solver.cpp:105] Iteration 540, lr = 1e-14
I0831 09:56:59.140298 2989601728 solver.cpp:218] Iteration 560 (0.0144288 iter/s, 1386.11s/20 iters), loss = 190732
I0831 09:56:59.142597 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 09:56:59.142642 2989601728 sgd_solver.cpp:105] Iteration 560, lr = 1e-14
I0831 10:20:10.334044 2989601728 solver.cpp:218] Iteration 580 (0.0143762 iter/s, 1391.19s/20 iters), loss = 190732
I0831 10:20:10.336256 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 10:20:10.336287 2989601728 sgd_solver.cpp:105] Iteration 580, lr = 1e-14
I0831 10:43:15.363580 2989601728 solver.cpp:218] Iteration 600 (0.0144402 iter/s, 1385.03s/20 iters), loss = 190732
I0831 10:43:15.365350 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 10:43:15.365380 2989601728 sgd_solver.cpp:105] Iteration 600, lr = 1e-14
I0831 11:06:26.864280 2989601728 solver.cpp:218] Iteration 620 (0.014373 iter/s, 1391.5s/20 iters), loss = 190732
I0831 11:06:26.867431 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:06:26.867480 2989601728 sgd_solver.cpp:105] Iteration 620, lr = 1e-14
I0831 11:29:37.275745 2989601728 solver.cpp:218] Iteration 640 (0.0143843 iter/s, 1390.41s/20 iters), loss = 190732
I0831 11:29:37.277166 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:29:37.277206 2989601728 sgd_solver.cpp:105] Iteration 640, lr = 1e-14
I0831 11:30:47.900959 26275840 data_layer.cpp:73] Restarting data prefetching from start.
I0831 11:30:47.934394 25739264 data_layer.cpp:73] Restarting data prefetching from start.
I0831 11:53:00.394335 2989601728 solver.cpp:218] Iteration 660 (0.014254 iter/s, 1403.11s/20 iters), loss = 190732
I0831 11:53:00.399102 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 11:53:00.399185 2989601728 sgd_solver.cpp:105] Iteration 660, lr = 1e-14
I0831 12:16:24.352802 2989601728 solver.cpp:218] Iteration 680 (0.0142455 iter/s, 1403.95s/20 iters), loss = 190732
I0831 12:16:24.355890 2989601728 solver.cpp:237] Train net output #0: loss = 190732 (* 1 = 190732 loss)
I0831 12:16:24.356781 2989601728 sgd_solver.cpp:105] Iteration 680, lr = 1e-14
这是我对训练阶段网络的定义:
name: "face-detect"
state
phase: TRAIN
level: 0
stage: ""
layer
name: "data"
type: "Data"
top: "data"
include
phase: TRAIN
transform_param
mean_value: 104.006989
mean_value: 116.66877
mean_value: 122.678917
data_param
source: "data/fddb-face-database/train_img_lmdb"
scale: 0.00390625
batch_size: 16
backend: LMDB
layer
name: "label"
type: "Data"
top: "label"
include
phase: TRAIN
data_param
source: "data/fddb-face-database/train_lab_lmdb"
batch_size: 16
backend: LMDB
layer
name: "mod1_conv1"
type: "Convolution"
bottom: "data"
top: "mod1_conv1"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 64
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod1_relu1"
type: "ReLU"
bottom: "mod1_conv1"
top: "mod1_conv1"
layer
name: "mod1_conv2"
type: "Convolution"
bottom: "mod1_conv1"
top: "mod1_conv2"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 64
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod1_relu2"
type: "ReLU"
bottom: "mod1_conv2"
top: "mod1_conv2"
layer
name: "mod1_pool1"
type: "Pooling"
bottom: "mod1_conv2"
top: "mod1_pool1"
pooling_param
pool: MAX
kernel_size: 2
stride: 2
layer
name: "mod2_conv1"
type: "Convolution"
bottom: "mod1_pool1"
top: "mod2_conv1"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 128
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod2_relu1"
type: "ReLU"
bottom: "mod2_conv1"
top: "mod2_conv1"
layer
name: "mod2_conv2"
type: "Convolution"
bottom: "mod2_conv1"
top: "mod2_conv2"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 128
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod2_relu2"
type: "ReLU"
bottom: "mod2_conv2"
top: "mod2_conv2"
layer
name: "mod2_pool1"
type: "Pooling"
bottom: "mod2_conv2"
top: "mod2_pool1"
pooling_param
pool: MAX
kernel_size: 2
stride: 2
layer
name: "mod3_conv1"
type: "Convolution"
bottom: "mod2_pool1"
top: "mod3_conv1"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 256
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod3_relu1"
type: "ReLU"
bottom: "mod3_conv1"
top: "mod3_conv1"
layer
name: "mod3_conv2"
type: "Convolution"
bottom: "mod3_conv1"
top: "mod3_conv2"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 256
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod3_relu2"
type: "ReLU"
bottom: "mod3_conv2"
top: "mod3_conv2"
layer
name: "mod3_pool1"
type: "Pooling"
bottom: "mod3_conv2"
top: "mod3_pool1"
pooling_param
pool: MAX
kernel_size: 2
stride: 2
layer
name: "mod4_conv1"
type: "Convolution"
bottom: "mod3_pool1"
top: "mod4_conv1"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod4_relu1"
type: "ReLU"
bottom: "mod4_conv1"
top: "mod4_conv1"
layer
name: "mod4_conv2"
type: "Convolution"
bottom: "mod4_conv1"
top: "mod4_conv2"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod4_relu2"
type: "ReLU"
bottom: "mod4_conv2"
top: "mod4_conv2"
layer
name: "mod4_pool1"
type: "Pooling"
bottom: "mod4_conv2"
top: "mod4_pool1"
pooling_param
pool: MAX
kernel_size: 2
stride: 2
layer
name: "mod5_conv1"
type: "Convolution"
bottom: "mod4_pool1"
top: "mod5_conv1"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod5_relu1"
type: "ReLU"
bottom: "mod5_conv1"
top: "mod5_conv1"
layer
name: "mod5_conv2"
type: "Convolution"
bottom: "mod5_conv1"
top: "mod5_conv2"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 512
pad: 1
kernel_size: 3
stride: 1
weight_filler
type: "xavier"
bias_filler
type: "constant"
layer
name: "mod5_relu2"
type: "ReLU"
bottom: "mod5_conv2"
top: "mod5_conv2"
layer
name: "mod5_pool1"
type: "Pooling"
bottom: "mod5_conv2"
top: "mod5_pool1"
pooling_param
pool: MAX
kernel_size: 2
stride: 2
layer
name: "mod6_fc1"
type: "Convolution"
bottom: "mod5_pool1"
top: "mod6_fc1"
convolution_param
num_output: 4096
pad: 0
kernel_size: 1
stride: 1
layer
name: "mod6_relu1"
type: "ReLU"
bottom: "mod6_fc1"
top: "mod6_fc1"
layer
name: "mod6_drop1"
type: "Dropout"
bottom: "mod6_fc1"
top: "mod6_fc1"
dropout_param
dropout_ratio: 0.5
layer
name: "mod6_score1"
type: "Convolution"
bottom: "mod6_fc1"
top: "mod6_score1"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 2
pad: 0
kernel_size: 1
layer
name: "mod6_upscore1"
type: "Deconvolution"
bottom: "mod6_score1"
top: "mod6_upscore1"
param
lr_mult: 0
convolution_param
num_output: 2
bias_term: false
kernel_size: 2
stride: 2
layer
name: "mod6_score2"
type: "Convolution"
bottom: "mod4_pool1"
top: "mod6_score2"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 2
pad: 0
kernel_size: 1
layer
name: "crop"
type: "Crop"
bottom: "mod6_score2"
bottom: "mod6_upscore1"
top: "mod6_score2c"
layer
name: "mod6_fuse1"
type: "Eltwise"
bottom: "mod6_upscore1"
bottom: "mod6_score2c"
top: "mod6_fuse1"
eltwise_param
operation: SUM
layer
name: "mod6_upfuse1"
type: "Deconvolution"
bottom: "mod6_fuse1"
top: "mod6_upfuse1"
param
lr_mult: 0
convolution_param
num_output: 2
bias_term: false
kernel_size: 2
stride: 2
layer
name: "mod6_score3"
type: "Convolution"
bottom: "mod3_pool1"
top: "mod6_score3"
param
lr_mult: 1
decay_mult: 1
param
lr_mult: 2
decay_mult: 0
convolution_param
num_output: 2
pad: 0
kernel_size: 1
layer
name: "crop"
type: "Crop"
bottom: "mod6_score3"
bottom: "mod6_upfuse1"
top: "mod6_score3c"
layer
name: "mod6_fuse2"
type: "Eltwise"
bottom: "mod6_upfuse1"
bottom: "mod6_score3c"
top: "mod6_fuse2"
eltwise_param
operation: SUM
layer
name: "mod6_upfuse2"
type: "Deconvolution"
bottom: "mod6_fuse2"
top: "mod6_upfuse2"
param
lr_mult: 0
convolution_param
num_output: 2
bias_term: false
kernel_size: 8
stride: 8
layer
name: "crop"
type: "Crop"
bottom: "mod6_upfuse2"
bottom: "label"
top: "score"
layer
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
loss_param
normalize: false
这是我的solver.prototxt:
net: "models/face-detect/train_val.prototxt"
test_iter: 736
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-14
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "models/face-detect/snapshot/train"
test_initialization: false
# Uncomment the following to default to CPU mode solving
solver_mode: CPU
这是我准备 LMDB 的方法:
def load_image(img_path, size=None):
# Load image as np.uint8 0, ..., 255
# image shape: [height, width, channel]
img = cv2.imread(img_path)
# Resize to stack size
if size != None:
img = imresize(img, size);
# Switch to BGR from RGB
img = img[:, :, ::-1];
# Switch to [channel, height, width]
img = np.transpose(img, (2, 0, 1));
return img;
def load_label(img_path, size=None):
img = cv2.imread(img_path, cv2.COLOR_BGR2GRAY);
if size != None:
img = imresize(img, size);
# Verbose storage to single channel
img = np.reshape(img, [1, img.shape[0], img.shape[1]]);
return img;
def imgs_to_lmdb(img_paths, lmdb_path, dtype='rgb', size=None):
in_db = lmdb.open(lmdb_path, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
for img_idx, img_path in enumerate(img_paths):
if dtype == 'rgb':
img = load_image(img_path, size);
elif dtype == 'label':
img = load_label(img_path, size);
# Store as byte data
img_dat = caffe.io.array_to_datum(img);
in_txn.put(':0>10d'.format(img_idx), img_dat.SerializeToString());
in_db.close()
【问题讨论】:
【参考方案1】:您的base_lr
似乎太小了。所以你的权重不会更新得足够快。您应该从1e-10
中的base_lr
开始。学习率乘以损失梯度并用于更新权重。如果学习率太小,更新会非常小,收敛会太慢。太大的学习率会给你不稳定的结果。一开始没有神奇的数字,因此您必须凭经验为您的数据和网络找到正确的超参数。
【讨论】:
“没有神奇的数字可以开始” - 那么与其说“从1e-10
的base_lr
开始”,不如说“增加你的base_lr
,直到你看到进步” ?
@PrzemekD 体验。
@HarshWardhan 感谢您的快速回复,我尝试将基本学习率降低到 1e-10、1e-8 和目前的 1e-5,但在 100 次迭代后我看不到损失对于每个。我读到示例的准备方式也可能会影响。我已经用我创建 LMDB 的方式更新了帖子,也许你会在那里找到一些线索?图像为 RGB,标签为 0 和 1s。【参考方案2】:
您还应该尝试学习率衰减。我最喜欢的是 GoogleLeNet 中使用的恒定学习率衰减,其中我们每 8 个 epoch 将学习率降低 4%。衰减的学习率有助于收敛,因为它试图通过减少更新容量来保留更多信息。这意味着您的网络不会忘记它已经学习的内容。
在此之后,请始终使用基于动量的优化器,例如 Adam 或 RMSprop。它们极大地减少了学习中的紧张情绪,并确保顺利进入最小值。
【讨论】:
老兄,你引用的数字是用于分类任务的。 @lainak 正在做 FCN。 他引用的论文报告说他们使用了以下学习率。 我们将 FCN-AlexNet、FCN-VGG16 和 FCN-GoogLeNet 的固定学习率分别设置为 10e-3、10e-4 和 5e-5,通过线搜索选择 阅读他分享的github链接中的voc-fcn8s、voc-fcn16s和voc-fcn32s的内容。请参阅solver.prototxt 文件。我已经做过很多次这种培训了。以上是关于咖啡损失没有减少的主要内容,如果未能解决你的问题,请参考以下文章