VGG16做迁移学习时陷入死区,求解答

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了VGG16做迁移学习时陷入死区,求解答相关的知识,希望对你有一定的参考价值。

参考技术A B 主张产生迁移的关键是学习者在两种活动中概括出他们之间的共同原理,即在于主体所获得经验的类化。贾德(C.H.Judd)
D 激情是一种强烈的情感表现形式。往往发生在强烈刺激或突如其来的变化之后。具有迅猛、激烈、难以抑制等特点。
这个题目出的。。。。。很多人给出答案是B,辐合思维?聚合思维?
和第一题是一个意思。本回答被提问者采纳

第二十三节,TensorFlow下slim的使用以及使用VGG网络进行预训练迁移学习

这一节我们会详细街上slim库下面的一些函数的使用。

一 简介

slim被放在tensorflow.contrib这个库下面,导入的方法如下:

import tensorflow.contrib.slim as slim

这样我们就可以使用slim了,既然说到了,先来了解tensorflow.contrib这个库,tensorflow官方对它的描述是:此目录中的任何代码未经官方支持,可能会随时更改或删除。每个目录下都有指定的所有者。它旨在包含额外功能和贡献,最终会合并到核心TensorFlow中,但其接口可能仍然会发生变化,或者需要进行一些测试,看是否可以获得更广泛的接受。所以slim依然不属于原生tensorflow。

那么什么是slim?slim到底有什么用?

上一节已经讲到slim是一个使构建,训练,评估神经网络变得简单的库。它可以消除原生tensorflow里面很多重复的模板性的代码,让代码更紧凑,更具备可读性。另外slim提供了很多计算机视觉方面的著名模型(VGG, AlexNet等),我们不仅可以直接使用,甚至能以各种方式进行扩展。

slim的子模块及功能介绍:

  • arg_scope: provides a new scope named arg_scope that allows a user to define default arguments for specific operations within that scope.

除了基本的name_scope,variabel_scope外,又加了arg_scope,它是用来控制每一层的默认超参数的。(后面会详细说)

  • data: contains TF-slim‘s dataset definition, data providers, parallel_reader, and decoding utilities.

貌似slim里面还有一套自己的数据定义,这个跳过,我们用的不多。

  • evaluation: contains routines for evaluating models.

评估模型的一些方法,用的也不多。

  • layers: contains high level layers for building models using tensorflow.

这个比较重要,slim的核心和精髓,一些复杂层的定义。

  • learning: contains routines for training models.

一些训练规则。

  • losses: contains commonly used loss functions.

一些loss。

  • metrics: contains popular evaluation metrics.

评估模型的度量标准。

  • nets: contains popular network definitions such as VGG and AlexNet models.

包含一些经典网络,VGG等,用的也比较多。

  • queues: provides a context manager for easily and safely starting and closing QueueRunners.

文本队列管理,比较有用。

  • regularizers: contains weight regularizers.

包含一些正则规则。

  • variables: provides convenience wrappers for variable creation and manipulation.

这个比较有用,我很喜欢slim管理变量的机制。

 

二.slim定义模型

在slim中,组合使用variables, layers和scopes可以简洁的定义模型。

1.variable

定义于模型变量。生成一个weight变量, 用truncated normal初始化它, 并使用l2正则化,并将其放置于CPU上, 只需下面的代码即可:

#定义模型变量
weights = slim.model_variable(weights, shape=[10, 10, 3 , 3], 
                        initializer=tf.truncated_normal_initializer(stddev=0.1), 
                        regularizer=slim.l2_regularizer(0.05),
                        device=/CPU:0)
model_variables = slim.get_model_variables()

原生tensorflow包含两类变量:普通变量和局部变量。大部分变量都是普通变量,它们一旦生成就可以通过使用saver存入硬盘,局部变量只在session中存在,不会保存。

  • slim进一步的区分了变量类型,定义了model_variables(模型变量),这种变量代表了模型的参数。模型变量通过训练或者微调而得到学习,或者在评测或前向传播中可以从ckpt文件中载入。
  • 非模型参数在实际前向传播中不需要的参数,比如global_step。同样的,移动平均反应了模型参数,但它本身不是模型参数。如下:
#常规变量
my_var = slim.variable(my_var,shape=[20, 1],
                       initializer=tf.zeros_initializer())
#get_variables()得到模型参数和常规参数
regular_variables_and_model_variables = slim.get_variables()

当我们通过slim的layers或着直接使用slim.model_variable创建变量时,tf会将此变量加入tf.GraphKeys.MODEL_VARIABLES这个集合中,当你需要构建自己的变量时,可以通过以下代码
将其加入模型参数。

#Letting TF-Slim know about the additional variable.
slim.add_model_variable(my_var)

 2.layers

抽象并封装了常用的层,并且提供了repeat和stack操作,使得定义网络更加方便。
首先让我们看看tensorflow怎么实现一个层,例如卷积层:

#在tensorflow下实现一个层
input_x = tf.placeholder(dtype=tf.float32,shape=[None,224,224,3])
with tf.name_scope(conv1_1) as scope:  
    weight = tf.Variable(tf.truncated_normal([3, 3, 3, 64], 
                        dtype=tf.float32,  
                        stddev=1e-1), 
                        name=weights)  
    conv = tf.nn.conv2d(input_x, weight, [1, 1, 1, 1], padding=SAME)  
    bias = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),  
                       trainable=True, name=biases)      
    conv1 = tf.nn.relu(tf.nn.bias_add(conv, bias), name=scope)

然后slim的实现:

#在slim实现一层
net = slim.conv2d(input_x, 64, [3, 3], scope=conv1_1)  

但这个不是重要的,因为tenorflow目前也有大部分层的简单实现,这里比较吸引人的是slim中的repeat和stack操作:

假设定义三个相同的卷积层:

net = ...  
net = slim.conv2d(net, 256, [3, 3], scope=conv2_1)  
net = slim.conv2d(net, 256, [3, 3], scope=conv2_2)  
net = slim.conv2d(net, 256, [3, 3], scope=conv2_3)  
net = slim.max_pool2d(net, [2, 2], scope=pool2)  

在slim中的repeat操作可以减少代码量:

net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope=conv2)  
net = slim.max_pool2d(net, [2, 2], scope=pool2)  

repeat不仅只实现了相同操作相同参数的重复,它还将scope进行了展开,例子中的scope被展开为 ‘conv2/conv2_1‘, ‘conv2/conv2_2‘ and ‘conv2/conv2_3‘。

而stack是处理卷积核或者输出不一样的情况,假设定义三层FC:

#stack的使用  stack是处理卷积核或者输出不一样的情况,
x = tf.placeholder(dtype=tf.float32,shape=[None,784])
x = slim.fully_connected(x, 32, scope=fc/fc_1)  
x = slim.fully_connected(x, 64, scope=fc/fc_2)  
x = slim.fully_connected(x, 128, scope=fc/fc_3)  
#使用stack操作:
x = slim.stack(x, slim.fully_connected, [32, 64, 128], scope=fc)  

同理卷积层也一样:

# 普通方法:  
net = slim.conv2d(input_x, 32, [3, 3], scope=core/core_1)  
net = slim.conv2d(net, 32, [1, 1], scope=core/core_2)  
net = slim.conv2d(net, 64, [3, 3], scope=core/core_3)  
net = slim.conv2d(net, 64, [1, 1], scope=core/core_4)  
  
# 简便方法:  
net = slim.stack(input_x, slim.conv2d, [(32, [3, 3]), (32, [1, 1]), (64, [3, 3]), (64, [1, 1])], scope=core)  

 3.scope

除了tensorflow中的name_scope和variable_scope, tf.slim新增了arg_scope操作,这一操作符可以让定义在这一scope中的操作共享参数,即如不指定参数的话,则使用默认参数。且参数可以被局部覆盖。

如果你的网络有大量相同的参数,如下:

net = slim.conv2d(input_x, 64, [11, 11], 4, padding=SAME,  
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                  weights_regularizer=slim.l2_regularizer(0.0005), scope=conv1)  
net = slim.conv2d(net, 128, [11, 11], padding=VALID,  
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                  weights_regularizer=slim.l2_regularizer(0.0005), scope=conv2)  
net = slim.conv2d(net, 256, [11, 11], padding=SAME,  
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                  weights_regularizer=slim.l2_regularizer(0.0005), scope=conv3)  

然后我们用arg_scope处理一下:

#使用arg_scope
with slim.arg_scope([slim.conv2d], padding=SAME,  
                    weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                    weights_regularizer=slim.l2_regularizer(0.0005)):  
    net = slim.conv2d(input_x, 64, [11, 11], scope=conv1)  
    net = slim.conv2d(net, 128, [11, 11], padding=VALID, scope=conv2)  
    net = slim.conv2d(net, 256, [11, 11], scope=conv3)  

如上倒数第二行代码,对padding进行了重新赋值。那如果除了卷积层还有其他层呢?那就要如下定义:

with slim.arg_scope([slim.conv2d, slim.fully_connected],  
                      activation_fn=tf.nn.relu,  
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01),  
                      weights_regularizer=slim.l2_regularizer(0.0005)):  
    with slim.arg_scope([slim.conv2d], stride=1, padding=SAME):  
        net = slim.conv2d(input_x, 64, [11, 11], 4, padding=VALID, scope=conv1)  
        net = slim.conv2d(net, 256, [5, 5],  
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.03),  
                      scope=conv2)  
        net = slim.fully_connected(net, 1000, activation_fn=None, scope=fc)  

写两个arg_scope就行了。采用如上方法,定义一个VGG也就十几行代码的事了。

#定义一个vgg16网络
def vgg16(inputs):  
    with slim.arg_scope([slim.conv2d, slim.fully_connected],  
                      activation_fn=tf.nn.relu,  
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),  
                      weights_regularizer=slim.l2_regularizer(0.0005)):  
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope=conv1)  
        net = slim.max_pool2d(net, [2, 2], scope=pool1)  
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope=conv2)  
        net = slim.max_pool2d(net, [2, 2], scope=pool2)  
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope=conv3)  
        net = slim.max_pool2d(net, [2, 2], scope=pool3)  
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope=conv4)  
        net = slim.max_pool2d(net, [2, 2], scope=pool4)  
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope=conv5)  
        net = slim.max_pool2d(net, [2, 2], scope=pool5)  
        net = slim.fully_connected(net, 4096, scope=fc6)  
        net = slim.dropout(net, 0.5, scope=dropout6)  
        net = slim.fully_connected(net, 4096, scope=fc7)  
        net = slim.dropout(net, 0.5, scope=dropout7)  
        net = slim.fully_connected(net, 1000, activation_fn=None, scope=fc8)  
        return net  

 

三.训练模型

这里直接选用经典网络。

import tensorflow as tf  
vgg = tf.contrib.slim.nets.vgg  
  
# Load the images and labels.  
images, labels = ...  
  
# Create the model.  
predictions, _ = vgg.vgg_16(images)  
  
# Define the loss functions and get the total loss.  
loss = slim.losses.softmax_cross_entropy(predictions, labels)  

关于loss,要说一下定义自己的loss的方法,以及注意不要忘记加入到slim中让slim看到你的loss。

还有正则项也是需要手动添加进loss当中的,不然最后计算的时候就不优化正则目标了。

# Load the images and labels.  
images, scene_labels, depth_labels, pose_labels = ...  
  
# Create the model.  
scene_predictions, depth_predictions, pose_predictions = CreateMultiTaskModel(images)  
  
# Define the loss functions and get the total loss.  
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)  
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)  
pose_loss = MyCustomLossFunction(pose_predictions, pose_labels)  
slim.losses.add_loss(pose_loss) # Letting TF-Slim know about the additional loss.  
  
# The following two ways to compute the total loss are equivalent:  
regularization_loss = tf.add_n(slim.losses.get_regularization_losses())  
total_loss1 = classification_loss + sum_of_squares_loss + pose_loss + regularization_loss  
  
# (Regularization Loss is included in the total loss by default).  
total_loss2 = slim.losses.get_total_loss()  

slim在learning.py中提供了一个简单而有用的训练模型的工具。我们只需调用slim.learning.create_train_op 和slim.learning.train就可以完成优化过程。

slim.learning.train函数被用来训练神经网络,函数定义如下:

 

def slim.learning.train(train_op,
          logdir,
          train_step_fn=train_step,
          train_step_kwargs=_USE_DEFAULT,
          log_every_n_steps=1,
          graph=None,
          master=‘‘,
          is_chief=True,
          global_step=None,
          number_of_steps=None,
          init_op=_USE_DEFAULT,
          init_feed_dict=None,
          local_init_op=_USE_DEFAULT,
          init_fn=None,
          ready_op=_USE_DEFAULT,
          summary_op=_USE_DEFAULT,
          save_summaries_secs=600,
          summary_writer=_USE_DEFAULT,
          startup_delay_steps=0,
          saver=None,
          save_interval_secs=600,
          sync_optimizer=None,
          session_config=None,
          trace_every_n_steps=None):

 

其中部分参数的说明如下:

  • train_op: A `Tensor` that, when executed, will apply the gradients and return the loss value.
  • logdir: The directory where training logs are written to. If None, model checkpoints and summaries will not be written.检查点文件和日志文件的保存路径。
  • number_of_steps: The max number of gradient steps to take during training,as measured by ‘global_step‘: training will stop if global_step is greater than ‘number_of_steps‘. If the value is left as None, training proceeds indefinitely.默认是一致循环训练。
  • save_summaries_secs: How often, in seconds, to save summaries.
  • summary_writer: `SummaryWriter` to use. Can be `None` to indicate that no summaries should be written. If unset, we create a SummaryWriter.
  • startup_delay_steps: The number of steps to wait for before beginning. Note that this must be 0 if a sync_optimizer is supplied.
  • saver: Saver to save checkpoints. If None, a default one will be created and used.
  • save_interval_secs: How often, in seconds, to save the model to `logdir`.

 

g = tf.Graph()

# Create the model and specify the losses...
...

total_loss = slim.losses.get_total_loss()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimum(total_loss)

# create_train_op ensures that each time we ask for the loss, the update_ops
# are run and the gradients being computed are applied too.
train_op = slim.learning.create_train_op(total_loss, optimizer)
logdir = ... # Where checkpoints are stored.

slim.learning.train(
    train_op,
    logdir,
    number_of_steps=1000,     #迭代次数
          save_summaries_secs=300,        #存summary间隔秒数
          save_interval_secs=600)             #存模型间隔秒数

 

四.读取保存模型变量

在迁移学习中,我们经常会用到别人已经训练好的网络和模型参数,这时候我们可能需要从检查点文件中加载部分变量,下面我就会讲解如何加载指定变量。以及当前图的变量名和检查点文件中变量名不一致时怎么办。

1. 从检查恢复部分变量

通过以下功能我们可以载入模型的部分变量:

 

# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to restore all the variables.
restorer = tf.train.Saver()

# Add ops to restore some variables.
restorer = tf.train.Saver([v1, v2])

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

通过这种方式我们可以加载不同变量名的变量!

 2 从从检查点恢复部分变量还可以采用其他方法

 

# Create some variables.
v1 = slim.variable(name="v1", ...)
v2 = slim.variable(name="nested/v2", ...)
...

# Get list of variables to restore (which contains only ‘v2‘). These are all
# equivalent methods:
#从检查点文件中恢复name=‘v2‘的变量
variables_to_restore = slim.get_variables_by_name("v2")     
# or 从检查点文件中恢复name带有2的所有变量
variables_to_restore = slim.get_variables_by_suffix("2")     
# or 从检查点文件中恢复命名空间scope=‘nested‘的所有变量
variables_to_restore = slim.get_variables(scope="nested")    
# or 恢复包含命名空间为nested的所有变量
variables_to_restore = slim.get_variables_to_restore(include=["nested"])  
# or 除了命名空间为‘v1‘之外的所有变量
variables_to_restore = slim.get_variables_to_restore(exclude=["v1"])      

# Create the saver which will be used to restore the variables.
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

3.当图的变量名与checkpoint中的变量名不同时,恢复模型参数

当从checkpoint文件中恢复变量时,Saver在checkpoint文件中定位到变量名,并且把它们映射到当前图中的变量中。之前的例子中,我们创建了Saver,并为其提供了变量列表作为参数。这时,在checkpoint文件中定位的变量名,是隐含地从每个作为参数给出的变量的var.op.name而获得的。这一方式在图与checkpoint文件中变量名字相同时,可以很好的工作。而当名字不同时,必须给Saver提供一个将checkpoint文件中的变量名映射到图中的每个变量的字典。

假设我们定义的网络变量是conv1/weights,而从VGG检查点文件加载的变量名为vgg16/conv1/weights,正常load肯定会报错(找不到变量名),但是可以这样:例子见下:

# Assuming that ‘conv1/weights‘ should be restored from ‘vgg16/conv1/weights‘
def name_in_checkpoint(var):
  return vgg16/ + var.op.name

# Assuming that ‘conv1/weights‘ and ‘conv1/bias‘ should be restored from ‘conv1/params1‘ and ‘conv1/params2‘
def name_in_checkpoint(var):
  if "weights" in var.op.name:
    return var.op.name.replace("weights", "params1")
  if "bias" in var.op.name:
    return var.op.name.replace("bias", "params2")

variables_to_restore = slim.get_model_variables()
variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore}
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")

4.在一个不同的任务上对网络进行微调

比如我们要将1000类的imagenet分类任务应用于20类的Pascal VOC分类任务中,我们只导入部分层,见下例:

image, label = MyPascalVocDataLoader(...)
images, labels = tf.train.batch([image, label], batch_size=32)

# Create the model,20类
predictions = vgg.vgg_16(images,num_classes=20)

train_op = slim.learning.create_train_op(...)

# Specify where the Model, trained on ImageNet, was saved.
model_path = /path/to/pre_trained_on_imagenet.checkpoint

# Specify where the new model will live:
log_dir = /path/to/my_pascal_model_dir/

# Restore only the convolutional layers: 从检查点载入除了fc6,fc7,fc8层之外的参数
variables_to_restore = slim.get_variables_to_restore(exclude=[fc6, fc7, fc8])
init_fn = assign_from_checkpoint_fn(model_path, variables_to_restore)

# Start training.
slim.learning.train(train_op, log_dir, init_fn=init_fn)

  

 

 

参考文章:【Tensorflow】辅助工具篇——tensorflow slim(TF-Slim)介绍

TF-Slim简介












以上是关于VGG16做迁移学习时陷入死区,求解答的主要内容,如果未能解决你的问题,请参考以下文章

Keras深度学习实战(10)——迁移学习

为啥我需要在迁移学习中预训练权重

Keras深度学习实战——基于VGG19模型实现性别分类

迁移学习案例:Keras基于VGG对五种图片类别识别

Keras实例教程之迁移学习

深度学习 Vgg16 为啥我的模型不适合?