如何在Caffe中配置每一个层的结构

Posted 2023-04-27

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何在Caffe中配置每一个层的结构相关的知识，希望对你有一定的参考价值。

参考技术A 何Caffe配置每层结构近刚电脑装Caffe由于神经中国络同层结构同类型层同参数所根据Caffe官中国说明文档做简单总结 1. Vision Layers 1.1 卷积层(Convolution) 类型：CONVOLUTION 例 layers name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 # learning rate multiplier for the filters blobs_lr: 2 # learning rate multiplier for the biases weight_decay: 1 # weight decay multiplier for the filters weight_decay: 0 # weight decay multiplier for the biases convolution_param num_output: 96 # learn 96 filters kernel_size: 11 # each filter is 11x11 stride: 4 # step 4 pixels between each filter application weight_filler type: "gaussian" # initialize the filters from a Gaussian std: 0.01 # distribution with stdev 0.01 (default mean: 0) bias_filler type: "constant" # initialize the biases to zero (0) value: 0 blobs_lr: 习率调整参数面例设置权重习率运行求解器给习率同偏置习率权重两倍 weight_decay：卷积层重要参数必须参数： num_output (c_o)：滤器数 kernel_size (or kernel_h and kernel_w)：滤器选参数： weight_filler [default type: 'constant' value: 0]：参数初始化 bias_filler：偏置初始化 bias_term [default true]：指定否否启偏置项 pad (or pad_h and pad_w) [default 0]：指定输入每边加少像素 stride (or stride_h and stride_w) [default 1]：指定滤器步 group (g) [default 1]: If g > 1, we restrict the connectivityof each filter to a subset of the input. Specifically, the input and outputchannels are separated into g groups, and the ith output group channels will beonly connected to the ith input group channels. 通卷积变化：输入：n * c_i * h_i * w_i 输：n * c_o * h_o * w_o其h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1w_o通同计算 1.2 池化层（Pooling）类型：POOLING 例 layers name: "pool1" type: POOLING bottom: "conv1" top: "pool1" pooling_param pool: MAX kernel_size: 3 # pool over a 3x3 region stride: 2 # step two pixels (in the bottom blob) between pooling regions 卷积层重要参数必需参数： kernel_size (or kernel_h and kernel_w)：滤器选参数： pool [default MAX]：pooling目前MAX, AVE, STOCHASTIC三种 pad (or pad_h and pad_w) [default 0]：指定输入每遍加少像素 stride (or stride_h and stride_w) [default1]：指定滤器步通池化变化：输入：n * c_i * h_i * w_i 输：n * c_o * h_o * w_o其h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1w_o通同计算 1.3 Local Response Normalization (LRN) 类型：LRN Local ResponseNormalization局部输入区域进行归化（激a加归化权重（母部）新激b）两种同形式种输入区域相邻channels（cross channel LRN）另种同channel内空间区域（within channel LRN）计算公式：每输入除选参数： local_size [default 5]：于cross channel LRN需要求邻近channel数量；于within channel LRN需要求空间区域边 alpha [default 1]：scaling参数 beta [default 5]：指数 norm_region [default ACROSS_CHANNELS]: 选择哪种LRNACROSS_CHANNELS 或者WITHIN_CHANNEL 2. Loss Layers 深度习通化输目标Loss驱习 2.1 Softmax 类型: SOFTMAX_LOSS 2.2 Sum-of-Squares / Euclidean 类型: EUCLIDEAN_LOSS 2.3 Hinge / Margin 类型: HINGE_LOSS 例： # L1 Norm layers name: "loss" type: HINGE_LOSS bottom: "pred" bottom: "label" # L2 Norm layers name: "loss" type: HINGE_LOSS bottom: "pred" bottom: "label" top: "loss" hinge_loss_param norm: L2 选参数： norm [default L1]: 选择L1或者 L2范数输入： n * c * h * wPredictions n * 1 * 1 * 1Labels 输 1 * 1 * 1 * 1Computed Loss 2.4 Sigmoid Cross-Entropy 类型：SIGMOID_CROSS_ENTROPY_LOSS 2.5 Infogain 类型：INFOGAIN_LOSS 2.6 Accuracy and Top-k 类型：ACCURACY 用计算输目标确率事实loss且没backward步 3. 激励层（Activation / Neuron Layers）般说激励层element-wise操作输入输相同般情况非线性函数 3.1 ReLU / Rectified-Linear and Leaky-ReLU 类型: RELU 例: layers name: "relu1" type: RELU bottom: "conv1" top: "conv1" 选参数： negative_slope [default 0]:指定输入值于零输 ReLU目前使用做激励函数主要其收敛更快并且能保持同效标准ReLU函数max(x, 0)般x > 0输xx <= 0输negative_slopeRELU层支持in-place计算意味着bottom输输入相同避免内存消耗 3.2 Sigmoid 类型: SIGMOID 例: layers name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID SIGMOID 层通 sigmoid(x) 计算每输入x输函数图 3.3 TanH / Hyperbolic Tangent 类型: TANH 例: layers name: "encode1neuron" bottom: "encode1" top: "encode1neuron" type: SIGMOID TANH层通 tanh(x) 计算每输入x输函数图 3.3 Absolute Value 类型: ABSVAL 例: layers name: "layer" bottom: "in" top: "out" type: ABSVAL ABSVAL层通 abs(x) 计算每输入x输 3.4 Power 类型: POWER 例： layers name: "layer" bottom: "in" top: "out" type: POWER power_param power: 1 scale: 1 shift: 0 选参数： power [default 1] scale [default 1] shift [default 0] POWER层通 (shift + scale * x) ^ power计算每输入x输 3.5 BNLL 类型: BNLL 例： layers name: "layer" bottom: "in" top: "out" type: BNLL BNLL (binomial normal log likelihood) 层通 log(1 + exp(x)) 计算每输入x输 4. 数据层（Data Layers）数据通数据层进入Caffe数据层整中国络底部数据自高效数据库（LevelDB 或者 LMDB）直接自内存追求高效性HDF5或者般图像格式硬盘读取数据 4.1 Database 类型：DATA 必须参数： source:包含数据目录名称 batch_size:处理输入数量选参数： rand_skip:始候输入跳数值异步随机梯度降（SGD）候非用 backend [default LEVELDB]: 选择使用 LEVELDB 或者 LMDB 4.2 In-Memory 类型: MEMORY_DATA 必需参数： batch_size, channels, height, width: 指定内存读取数据 The memory data layer reads data directly from memory, without copying it. In order to use it, one must call MemoryDataLayer::Reset (from C++) or Net.set_input_arrays (from Python) in order to specify a source of contiguous data (as 4D row major array), which is read one batch-sized chunk at a time. 4.3 HDF5 Input 类型: HDF5_DATA 必要参数： source:需要读取文件名 batch_size：处理输入数量 4.4 HDF5 Output 类型: HDF5_OUTPUT 必要参数： file_name: 输文件名 HDF5作用节其层输入blobs写硬盘 4.5 Images 类型: IMAGE_DATA 必要参数： source: text文件名字每行给张图片文件名label batch_size: batch图片数量选参数： rand_skip：始候输入跳数值异步随机梯度降（SGD）候非用 shuffle [default false] new_height, new_width: 所图像resize 4.6 Windows 类型：WINDOW_DATA 4.7 Dummy 类型：DUMMY_DATA Dummy 层用于development debugging具体参数DummyDataParameter 5. 般层（Common Layers） 5.1 全连接层Inner Product 类型：INNER_PRODUCT 例： layers name: "fc8" type: INNER_PRODUCT blobs_lr: 1 # learning rate multiplier for the filters blobs_lr: 2 # learning rate multiplier for the biases weight_decay: 1 # weight decay multiplier for the filters weight_decay: 0 # weight decay multiplier for the biases inner_product_param num_output: 1000 weight_filler type: "gaussian" std: 0.01 bias_filler type: "constant" value: 0 bottom: "fc7" top: "fc8" 必要参数： num_output (c_o)：滤器数选参数： weight_filler [default type: 'constant' value: 0]：参数初始化 bias_filler：偏置初始化 bias_term [default true]：指定否否启偏置项通全连接层变化：输入：n * c_i * h_i * w_i 输：n * c_o * 1 *1 5.2 Splitting 类型：SPLIT Splitting层输入blob离输blobs用需要blob输入输层候 5.3 Flattening 类型：FLATTEN Flattening输入n * c * h * w变简单向量其 n * (c*h*w) * 1 * 1 5.4 Concatenation 类型：CONCAT 例： layers name: "concat" bottom: "in1" bottom: "in2" top: "out" type: CONCAT concat_param concat_dim: 1 选参数： concat_dim [default 1]：0代表链接num1代表链接channels 通全连接层变化：输入：1K每blobn_i * c_i * h * w 输： concat_dim = 0: (n_1 + n_2 + + n_K) *c_1 * h * w需要保证所输入c_i 相同 concat_dim = 1: n_1 * (c_1 + c_2 + +c_K) * h * w需要保证所输入n_i 相同通Concatenation层blobs链接blob 5.5 Slicing The SLICE layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices. 5.6 Elementwise Operations 类型：ELTWISE 5.7 Argmax 类型：ARGMAX 5.8 Softmax 类型：SOFTMAX 5.9 Mean-Variance Normalization 类型：MVN 6. 参考 Caffe

caffe之卷积层

在caffe中，网络的结构由prototxt文件中给出，由一些列的Layer（层）组成，常用的层如：数据加载层、卷积操作层、pooling层、非线性变换层、内积运算层、归一化层、损失计算层等；本篇主要介绍卷积层

1. 卷积层总述

下面首先给出卷积层的结构设置的一个小例子（定义在.prototxt文件中）

layer {

  name: "conv1" // 该层的名字
  type: "Convolution" // 该层的类型，具体地，可选的类型有：Convolution、
  bottom: "data" // 该层的输入数据Blob的名字
  top: "conv1" // 该层的输出数据Blob的名字

  // 该层的权值和偏置相关参数
  param { 
    lr_mult: 1  //weight的学习率
  }
  param {
    lr_mult: 2  // bias的学习率
  }

  // 该层（卷积层）的卷积运算相关的参数
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"  // weights初始化方法
    }
    bias_filler {
      type: "constant" // bias初始化方法
    }
  }

}

注：在caffe的原始proto文件中，关于卷积层的参数ConvolutionPraram定义如下：

message ConvolutionParameter {
  optional uint32 num_output = 1; // The number of outputs for the layer
  optional bool bias_term = 2 [default = true]; // whether to have bias terms

  // Pad, kernel size, and stride are all given as a single value for equal dimensions in all spatial dimensions, or once per spatial dimension.
  repeated uint32 pad = 3; // The padding size; defaults to 0
  repeated uint32 kernel_size = 4; // The kernel size
  repeated uint32 stride = 6; // The stride; defaults to 1
  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting holes. (Kernel dilation is sometimes referred to by its use in the algorithme à trous from Holschneider et al. 1987.)
  repeated uint32 dilation = 18; // The dilation; defaults to 1

  // For 2D convolution only, the *_h and *_w versions may also be used to specify both spatial dimensions.
  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
  optional uint32 kernel_h = 11; // The kernel height (2D only)
  optional uint32 kernel_w = 12; // The kernel width (2D only)
  optional uint32 stride_h = 13; // The stride height (2D only)
  optional uint32 stride_w = 14; // The stride width (2D only)

  optional uint32 group = 5 [default = 1]; // The group size for group conv

  optional FillerParameter weight_filler = 7; // The filler for the weight
  optional FillerParameter bias_filler = 8; // The filler for the bias
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 15 [default = DEFAULT];

  // The axis to interpret as "channels" when performing convolution.
  // Preceding dimensions are treated as independent inputs;
  // succeeding dimensions are treated as "spatial".
  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
  // groups g>1) filters across the spatial axes (H, W) of the input.
  // With (N, C, D, H, W) inputs, and axis == 1, we perform
  // N independent 3D convolutions, sliding (C/g)-channels
  // filters across the spatial axes (D, H, W) of the input.
  optional int32 axis = 16 [default = 1];

  // Whether to force use of the general ND convolution, even if a specific
  // implementation for blobs of the appropriate number of spatial dimensions
  // is available. (Currently, there is only a 2D-specific convolution
  // implementation; for input blobs with num_axes != 2, this option is
  // ignored and the ND implementation will be used.)
  optional bool force_nd_im2col = 17 [default = false];
}

2. 卷积层相关参数

接下来，分别对卷积层的相关参数进行说明

（根据卷积层的定义，它的学习参数应该为filter的取值和bias的取值，其他的相关参数都为hyper-paramers，在定义模型时是要给出的）

lr_mult：学习率系数

放置在param{}中

该系数用来控制学习率，在进行训练过程中，该层参数以该系数乘solver.prototxt配置文件中的base_lr的值为学习率

即学习率=lr_mult*base_lr

如果该层在结构配置文件中有两个lr_mult，则第一个表示fitler的权值学习率系数，第二个表示偏执项的学习率系数（一般情况下，偏执项的学习率系数是权值学习率系数的两倍）

convolution_praram：卷积层的其他参数

放置在convoluytion_param{}中

该部分对卷积层的其他参数进行设置，有些参数为必须设置，有些参数为可选（因为可以直接使用默认值）

必须设置的参数

num_output：该卷积层的filter个数
kernel_size：卷积层的filter的大小（直接用该参数时，是filter的长宽相等，2D情况时，也可以设置为不能，此时，利用kernel_h和kernel_w两个参数设定）

其他可选的设置参数

stride：filter的步长，默认值为1
pad：是否对输入的image进行padding，默认值为0，即不填充（注意，进行padding可能会带来一些无用信息，输入image较小时，似乎不太合适）
weight_filter：权值初始化方法，使用方法如下
weight_filter{
type:"xavier" //这里的xavier是一冲初始化算法，也可以是“gaussian”；默认值为“constant”，即全部为0
}
bias_filter：偏执项初始化方法
bias_filter{
type:"xavier" //这里的xavier是一冲初始化算法，也可以是“gaussian”；默认值为“constant”，即全部为0
}
bias_term：是否使用偏执项，默认值为Ture

以上是关于如何在Caffe中配置每一个层的结构的主要内容，如果未能解决你的问题，请参考以下文章