Tensorflow 2 中用于自定义训练循环的 Tensorboard

Posted

技术标签:

【中文标题】Tensorflow 2 中用于自定义训练循环的 Tensorboard【英文标题】:Tensorboard for custom training loop in Tensorflow 2 【发布时间】:2020-06-23 15:43:56 【问题描述】:

我想在 tensorflow 2 中创建一个自定义训练循环并使用 tensorboard 进行可视化。这是我根据 tensorflow 文档创建的示例:

import tensorflow as tf
import datetime

os.environ["CUDA_VISIBLE_DEVICES"] = "0"    # which gpu to use

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))

train_dataset = train_dataset.shuffle(60000).batch(64)
test_dataset = test_dataset.batch(64)


def create_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28), name='Flatten_1'),
        tf.keras.layers.Dense(512, activation='relu', name='Dense_1'),
        tf.keras.layers.Dropout(0.2, name='Dropout_1'),
        tf.keras.layers.Dense(10, activation='softmax', name='Dense_2')
    ], name='Network')


# Loss and optimizer
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()

# Define our metrics
train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32)
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('train_accuracy')
test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32)
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy('test_accuracy')

@tf.function
def train_step(model, optimizer, x_train, y_train):
    with tf.GradientTape() as tape:
        predictions = model(x_train, training=True)
        loss = loss_object(y_train, predictions)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    train_loss(loss)
    train_accuracy(y_train, predictions)

@tf.function
def test_step(model, x_test, y_test):
    predictions = model(x_test)
    loss = loss_object(y_test, predictions)

    test_loss(loss)
    test_accuracy(y_test, predictions)


current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_log_dir = '/NAS/Dataset/logs/gradient_tape/' + current_time + '/train'
test_log_dir = '/NAS/Dataset/logs/gradient_tape/' + current_time + '/test'
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
test_summary_writer = tf.summary.create_file_writer(test_log_dir)

model = create_model()  # reset our model

EPOCHS = 5


for epoch in range(EPOCHS):
    for (x_train, y_train) in train_dataset:
        train_step(model, optimizer, x_train, y_train)
    with train_summary_writer.as_default():
        tf.summary.scalar('loss', train_loss.result(), step=epoch)
        tf.summary.scalar('accuracy', train_accuracy.result(), step=epoch)

    for (x_test, y_test) in test_dataset:
        test_step(model, x_test, y_test)
    with test_summary_writer.as_default():
        tf.summary.scalar('loss', test_loss.result(), step=epoch)
        tf.summary.scalar('accuracy', test_accuracy.result(), step=epoch)

    template = 'Epoch , Loss: , Accuracy: , Test Loss: , Test Accuracy: '
    print(template.format(epoch + 1,
                          train_loss.result(),
                          train_accuracy.result() * 100,
                          test_loss.result(),
                          test_accuracy.result() * 100))

    # Reset metrics every epoch
    train_loss.reset_states()
    test_loss.reset_states()
    train_accuracy.reset_states()
    test_accuracy.reset_states()

我正在终端上使用以下命令访问张量板:

tensorboard --logdir=.....

上面的代码生成损失和指标的摘要。我的问题是:

如何制作这个过程的图表?

我尝试使用 tensorflow 推荐的命令:tf.summary.trace_on()tf.summary.trace_export(),但我没有设法绘制图表。也许我用错了。我非常感谢任何有关如何执行此操作的建议。

【问题讨论】:

面对similar issue,但设法显示了我图表的一部分。 【参考方案1】:

正如here 的回答,我确信有更好的方法,但一个简单的解决方法是只使用现有的张量板回调逻辑:

tb_callback = tf.keras.callbacks.TensorBoard(LOG_DIR)
tb_callback.set_model(model) # Writes the graph to tensorboard summaries using 
an internal file writer

【讨论】:

它对我来说很好,所以我接受你的回答:) 谢谢! 嘿@ZιΒάγγο,上面的代码在 Tensorflow 2.x 的情况下对我不起作用。您是否遇到过其他解决方案? @Harman 对你来说不幸的是,我对 reubenjohn 的回答很满意,所以我没有进行更多搜索。但我相信我们也可以帮助您处理它。究竟是什么做得不好?能否提供具体细节?

以上是关于Tensorflow 2 中用于自定义训练循环的 Tensorboard的主要内容,如果未能解决你的问题,请参考以下文章

Tensorflow 2.0的自定义训练循环的学习率

自定义训练循环中记录tf.variable

tensorflow 测量工具,与自定义训练

如何在 TF 2.0 / 1.14.0-eager 和自定义训练循环(梯度磁带)中执行梯度累积?

具有自定义训练循环的 Tensorboard Graph 不包括我的模型

尝试使用 40x40px 图像自定义训练 MobilenetV2 - 训练后结果错误