tensorflow VGG16 网络精度和损失不会改变

Posted

技术标签:

【中文标题】tensorflow VGG16 网络精度和损失不会改变【英文标题】:tensorflow VGG16 network accuracy and loss doesn't change 【发布时间】:2020-02-19 17:31:28 【问题描述】:

我尝试使用 tensorflow 手工制作一个 VGG16 模型,并使用我自己的数据集进行训练,该数据集只有 2 个类,因此我找到了 VGG16 网络架构和如何在 github 上训练网络的方式作为参考

但是不知何故我找到的代码不起作用,所以我尝试修复代码。

我试了很久的代码,所有epoch的准确率和损失都没有变化,也找不到这段代码的bug。

谁能帮我弄清楚我哪里搞砸了。

我构建 VGG16 网络的代码如下:

def build_network(height, width, channel):
    x = tf.placeholder(tf.float32,shape = [None, height, width, channel], name = 'input')
    y = tf.placeholder(tf.int64, shape=[None, 2], name = 'labels_placeholders')

    def weight_variable(shape, name = "weights"):
        initial = tf.truncated_normal(shape, dtype = tf.float32, stddev = 0.1)
        return tf.Variable(initial, name = name)

    def bias_variable(shape, name = "biases"):
        initial = tf.constant(0.1, dtype = tf.float32, shape = shape)
        return tf.Variable(initial, name = name)

    def conv2d(input, w):
        return tf.nn.conv2d(input, w, [1,1,1,1], padding = "SAME")

    def pool_max(input):
        return tf.nn.max_pool(input, ksize = [1,2,2,1], strides = [1,2,2,1], padding = "SAME", name = "pool1")

    def fc(input, w, b):
        return tf.matmul(input,w)+b

    with tf.name_scope('conv1_1') as scope:
        kernel = weight_variable([3, 3, 3, 64])
        biases = bias_variable([64])
        output_conv1_1 = tf.nn.relu(conv2d(x,kernel) + biases, name = scope)

    with tf.name_scope("conv1_2") as scope:
        kernel = weight_variable([3,3,64,64])
        biases = bias_variable([64])
        output_conv1_2 = tf.nn.relu(conv2d(output_conv1_1, kernel) + biases, name = scope)

    pool1 = pool_max(output_conv1_2)

    with tf.name_scope("conv2_1") as scope:
        kernel = weight_variable([3,3,64,128])
        biases = bias_variable([128])
        output_conv2_1 = tf.nn.relu(conv2d(pool1, kernel) + biases, name = scope)

    with tf.name_scope("conv2_2") as scope:
        kernel = weight_variable([3,3,128,128])
        biases = bias_variable([128])
        output_conv2_2 = tf.nn.relu(conv2d(output_conv2_1, kernel) + biases, name = scope)


    pool2 = pool_max(output_conv2_2)

    with tf.name_scope("conv3_1") as scope:
        kernel = weight_variable([3,3,128,256])
        biases = bias_variable([256])
        output_conv3_1 = tf.nn.relu(conv2d(pool2,kernel)+biases, name = scope)

    with tf.name_scope("conv3_2") as scope:
        kernel = weight_variable([3,3,256,256])
        biases = bias_variable([256])
        output_conv3_2 = tf.nn.relu(conv2d(output_conv3_1, kernel) + biases, name = scope)

    with tf.name_scope("conv3_3") as scope:
        kernel = weight_variable ([3,3,256,256])
        biases = bias_variable([256])
        output_conv3_3 = tf.nn.relu(conv2d(output_conv3_2, kernel) + biases, name = scope)

    pool3 = pool_max(output_conv3_3)

    with tf.name_scope("conv4_1") as scope:
        kernel = weight_variable([3,3,256,512])
        biases = bias_variable([512])
        output_conv4_1 = tf.nn.relu(conv2d(pool3,kernel)+ biases, name = scope)

    with tf.name_scope('conv4_2') as scope:
        kernel = weight_variable([3, 3, 512, 512])
        biases = bias_variable([512])
        output_conv4_2 = tf.nn.relu(conv2d(output_conv4_1, kernel) + biases, name=scope)

    with tf.name_scope('conv4_3') as scope:
        kernel = weight_variable([3, 3, 512, 512])
        biases = bias_variable([512])
        output_conv4_3 = tf.nn.relu(conv2d(output_conv4_2, kernel) + biases, name=scope)

    pool4 = pool_max(output_conv4_3)


    with tf.name_scope('conv5_1') as scope:
        kernel = weight_variable([3, 3, 512, 512])
        biases = bias_variable([512])
        output_conv5_1 = tf.nn.relu(conv2d(pool4, kernel) + biases, name=scope)

    with tf.name_scope('conv5_2') as scope:
        kernel = weight_variable([3, 3, 512, 512])
        biases = bias_variable([512])
        output_conv5_2 = tf.nn.relu(conv2d(output_conv5_1, kernel) + biases, name=scope)

    with tf.name_scope('conv5_3') as scope:
        kernel = weight_variable([3, 3, 512, 512])
        biases = bias_variable([512])
        output_conv5_3 = tf.nn.relu(conv2d(output_conv5_2, kernel) + biases, name=scope)

    pool5 = pool_max(output_conv5_3)


    with tf.name_scope('fc6') as scope:
        shape = int(np.prod(pool5.get_shape()[1:]))
        kernel = weight_variable([shape, 4096])
        biases = bias_variable([4096])
        pool5_flat = tf.reshape(pool5, [-1, shape])
        output_fc6 = tf.nn.relu(fc(pool5_flat, kernel, biases), name=scope)

    with tf.name_scope('fc7') as scope:
        kernel = weight_variable([4096, 4096])
        biases = bias_variable([4096])
        output_fc7 = tf.nn.relu(fc(output_fc6, kernel, biases), name=scope)

    with tf.name_scope('fc8') as scope:
        kernel = weight_variable([4096, 2])
        biases = bias_variable([2])
        output_fc8 = tf.nn.relu(fc(output_fc7, kernel, biases), name=scope)

    finaloutput = tf.nn.softmax(output_fc8, name="softmax")
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output_fc8, labels=y))

    #define optimizer
    optimize = tf.train.AdamOptimizer(learning_rate= 1e-3).minimize(cost)
    #optimize = tf.train.GradientDescentOptimizer(learning_rate = 0.5).minimize(cost)
    prediction_labels = tf.argmax(finaloutput, axis = 1, name = "output")
    read_labels = tf.argmax(y, axis = 1)
    correct_prediction = tf.equal(prediction_labels, read_labels)

    # correct rate
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    correct_times_in_batch = tf.reduce_sum(tf.cast(correct_prediction, tf.int32))
    return dict(
        x=x,
        y=y,
        optimize = optimize,
        correct_prediction = correct_prediction,
        correct_times_in_batch = correct_times_in_batch,
        cost=cost,
        accuracy = accuracy
    )

def train_network(graph, batch_size, num_epochs, pb_file_path):
    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)
        epoch_delta = 2
        train_epoch_size = int(x_train.shape[0]/batch_size)
        test_epoch_size = int(x_val.shape[0]/batch_size)
        for epoch in range(num_epochs):

            for i in range(train_epoch_size):
                train_batch_x = x_train[i*batch_size:(i+1)*batch_size]
                train_batch_y = y_train[i*batch_size:(i+1)*batch_size]
                train_batch_y = np.asarray(train_batch_y).reshape(-1,1)
                train_batch_y_ohe  = keras.utils.to_categorical(train_batch_y, num_classes=2).astype(int)
                feed_dict = 
                    graph["x"]: np.reshape(train_batch_x,(batch_size, 224, 224, 3)),
                    graph["y"]: train_batch_y_ohe
                
                sess.run(graph['optimize'], feed_dict = feed_dict)
                loss = sess.run(graph['cost'], feed_dict = feed_dict)

                accuracy_ = sess.run(graph['accuracy'],feed_dict = feed_dict)

            print("epoch, loss:, accuracy:".format(epoch, loss, accuracy_))

这个bug困扰了我很久

【问题讨论】:

权重有变化吗? 重量和损失不变,始终保持不变 0.5 的学习率非常大,这不是你随机设置的值,从 0.01 之类的合理值开始,然后除以 10 来减少,直到损失开始减少。 我同意学习率太大,但我认为这不是损失没有改变的原因,因为我最初使用上面评论的 Adamoptimizer,但当时损失仍然保持不变价值 Adam 并不能真正训练 VGG16,但出于不同的原因。使用具有合理学习率的 SGD。 【参考方案1】:

您选择的一些参数的组合很可能是您的问题;首先要考虑的事情:

weight 初始化中的 stddev 更改为 0.01(0.1 是一个 巨大 值),即:

initial = tf.truncated_normal(shape, dtype = tf.float32, stddev = 0.01)

将您的 bias 初始化更改为零,即:

initial = tf.constant(0.0, dtype = tf.float32, shape = shape)

如前所述,使用

optimize = tf.train.GradientDescentOptimizer(learning_rate=LR).minimize(cost)

具有合理的学习率LR(从 0.01 或 0.001 开始)。

【讨论】:

请注意,Adam 无法训练类似 VGG 的网络,必须使用 SGD。 @MatiasValdenegro thanx,注意到您在 OP 中的评论;老实说,从来没有听说过这个(但我也不是 VGG 用户),你能分享任何突出这个问题的链接吗? 我不确定是否有记录在案的 VGG+Adam 的案例,但是一般都有记录的 Adam 失败的案例,见openreview.net/forum?id=ryQu7f-RZ(图 1),我训练过的 VGG16 模型无法收敛使用 Adam,但使用 SGD 时损失减少得很好。另请记住,原始 VGG 论文使用 SGD。 感谢您的帮助,但更改了您给我的所有建议,损失和准确性仍然没有改变https://imgur.com/ZRjHdm2

以上是关于tensorflow VGG16 网络精度和损失不会改变的主要内容,如果未能解决你的问题,请参考以下文章

当在 tensorflow 1.14 中使用混合精度训练时,张量对象在 keras vgg16 中没有属性“is_initialized”

深度学习之基于Tensorflow2.0实现VGG16网络

探索VGG网络与LeNet网络对精度的影响

探索VGG网络与LeNet网络对精度的影响

神经网络学习小记录61——Tensorflow2 搭建常见分类网络平台(VGG16MobileNetResNet50)

使用 keras 预训练 vgg16 的感知损失,输出图像颜色不正确