张量流中的批量标准化 - tf.contrib.layers.batch_norm 在训练中效果很好,但测试/验证结果很差

Posted

技术标签:

【中文标题】张量流中的批量标准化 - tf.contrib.layers.batch_norm 在训练中效果很好,但测试/验证结果很差【英文标题】:Batch normaliztion on tensorflow - tf.contrib.layers.batch_norm works good on training but poor testing/validation results 【发布时间】:2017-06-22 02:57:02 【问题描述】:

我尝试在 Mnist 数据集上使用函数 tf.contrib.layers.batch_norm 实现 CNN。

当我训练和检查模型时,我发现损失正在减少(很好!),但测试数据集的准确度仍然是随机的(~10%)(糟糕!!!)

如果我使用没有批量标准化的相同模型,我会看到测试准确度按预期提高。

您可以在下面的代码中看到我如何使用批量标准化功能。 如果我使用测试数据集设置 is_training=True 我会得到很好的结果,所以问题是批量标准化函数的 is_training=False 模式。

请帮我解决这个问题。在此先感谢大家。

    # BLOCK2 - Layer 1
    conv1 = tf.nn.conv2d(output, block2_layer1_1_weights, [1, 1, 1, 1], padding='SAME')
    conv2 = tf.nn.conv2d(output, block2_layer1_2_weights, [1, 1, 1, 1], padding='SAME')
    conv3 = tf.nn.conv2d(output, block2_layer1_3_weights, [1, 1, 1, 1], padding='SAME')
    conv4 = tf.nn.conv2d(output, block2_layer1_4_weights, [1, 1, 1, 1], padding='SAME')

    conv_normed1 = tf.contrib.layers.batch_norm(conv1, scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )
    conv_normed2 = tf.contrib.layers.batch_norm(conv2, scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )
    conv_normed3 = tf.contrib.layers.batch_norm(conv3, scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )
    conv_normed4 = tf.contrib.layers.batch_norm(conv4, scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )

    after_stack = tf.stack([conv_normed1, conv_normed2, conv_normed3, conv_normed4])

    after_maxout = tf.reduce_max(after_stack, 0)
    # BLOCK2 - Layer 2
    conv1 = tf.nn.conv2d(after_maxout, block2_layer2_1_weights, [1, 1, 1, 1], padding='SAME')
    conv2 = tf.nn.conv2d(after_maxout, block2_layer2_2_weights, [1, 1, 1, 1], padding='SAME')
    conv_normed1 = tf.contrib.layers.batch_norm(conv1, scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )
    conv_normed2 = tf.contrib.layers.batch_norm(conv2, scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )

    after_stack = tf.stack([conv_normed1, conv_normed2])

    after_maxout = tf.reduce_max(after_stack, 0)
    # BLOCK2 - Layer 3
    conv1 = tf.nn.conv2d(after_maxout, block2_layer3_1_weights, [1, 1, 1, 1], padding='SAME')
    conv2 = tf.nn.conv2d(after_maxout, block2_layer3_2_weights, [1, 1, 1, 1], padding='SAME')
    conv_normed1 = tf.contrib.layers.batch_norm(conv1 , scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )
    conv_normed2 = tf.contrib.layers.batch_norm(conv2 , scale=True, decay=batch_norm_decay, center=True,  is_training=is_training, updates_collections=None )

    after_stack = tf.stack([conv_normed1, conv_normed2])

    after_maxout = tf.reduce_max(after_stack, 0)
    pooled = tf.nn.max_pool(after_maxout, [1, 3, 3, 1], [1, 3, 3, 1], 'SAME')
    output = tf.nn.dropout(pooled, 0.5)




# # Training computation.
logits = model(tf_train_dataset)
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))

l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables() if 'BatchNorm' not in v.name])
loss += LAMBDA * l2_loss

#
# # Optimizer.



tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)

# # Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
#print(valid_prediction.shape)
test_prediction = tf.nn.softmax(model(tf_test_dataset))

num_steps = 6000
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print('Initialized')
for step in range(num_steps):

    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    test_offset = (step * batch_size) % (test_labels.shape[0] - batch_size)

    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = tf_train_dataset: batch_data, tf_train_labels: batch_labels, is_training: True

    _, l, predictions = session.run(
        [optimizer, loss, train_prediction], feed_dict=feed_dict)


    if (step % 50 == 0):

        print('Minibatch loss at step %d: %f' % (step, l))
        print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))

        for i in range(1, 10001):
            test_batch = test_dataset[((i - 1) * test_batch_size):(i * test_batch_size), :, :, :]
            pred = test_prediction.eval(feed_dict=tf_test_dataset: test_batch, is_training: False)


            if i == 1:
                stacked_pred = pred
            else:
                stacked_pred = np.vstack((stacked_pred, pred))


        print(np.argmax(stacked_pred,1))
        print('test accuracy: %.1f%%' % accuracy(stacked_pred, test_labels))`

【问题讨论】:

我遇到了同样的问题。我正在使用 slim,但不确定应该如何使用 BN 层 = = 【参考方案1】:

在训练期间,batch-norm 使用基于批次的统计信息。在评估/测试期间(只要is_trainingFalse),它使用人口统计数据。

在内部,人口统计数据通过隐式创建的更新操作进行更新,这些更新操作被添加到 tf.GraphKeys.UPDATE_OPS 集合中——但您必须强制 tensorflow 运行这些操作。一个简单的方法是在优化操作中引入control_dependencies

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss, step)

【讨论】:

以上是关于张量流中的批量标准化 - tf.contrib.layers.batch_norm 在训练中效果很好,但测试/验证结果很差的主要内容,如果未能解决你的问题,请参考以下文章

张量流中的动态批量大小

由张量流中的索引张量指定的切片二维张量

在张量流中,如何迭代存储在张量中的输入序列?

张量流中的条件图和访问张量大小的for循环

当切片本身是张量流中的张量时如何进行切片分配

张量流中的内存问题