如何查看 Google Colab 生成的 train_on_batch tensorboard 日志文件?

Posted

技术标签:

【中文标题】如何查看 Google Colab 生成的 train_on_batch tensorboard 日志文件?【英文标题】:How to view train_on_batch tensorboard log files generated by Google Colab? 【发布时间】:2020-06-23 22:36:26 【问题描述】:

我知道如何在我的本地机器上查看张量板图,同时我的神经网络使用本地 Jupyter Notebook 中的代码进行训练,使用以下代码。当我使用 Google Colab 来训练神经网络时,我需要做些什么不同的事情?使用 train_on_batch 时,我在网上看不到任何教程/示例。

定义我的模型(convnet)之后...

convnet.compile(loss='categorical_crossentropy',                                      
                optimizer=tf.keras.optimizers.Adam(0.001),
                metrics=['accuracy']
               )

# create tensorboard graph data for the model
tb = tf.keras.callbacks.TensorBoard(log_dir='Logs/Exp_15', 
                                    histogram_freq=0, 
                                    batch_size=batch_size, 
                                    write_graph=True, 
                                    write_grads=False)
tb.set_model(convnet)

num_epochs = 3
batches_processed_counter = 0

for epoch in range(num_epochs):

    for batch in range(int(train_img.samples/batch_size)): 
        batches_processed_counter = batches_processed_counter  + 1

        # get next batch of images & labels
        X_imgs, X_labels = next(train_img) 

        #train model, get cross entropy & accuracy for batch
        train_CE, train_acc = convnet.train_on_batch(X_imgs, X_labels) 

        # validation images - just predict
        X_imgs_val, X_labels_val = next(val_img)
        val_CE, val_acc = convnet.test_on_batch(X_imgs_val, X_labels_val) 

        # create tensorboard graph info for the cross entropy loss and training accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, 'train_loss': train_CE, 'train_acc': train_acc)

        # create tensorboard graph info for the cross entropy loss and VALIDATION accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, 'val_loss': val_CE, 'val_acc': val_acc)

        print('epoch', epoch, 'batch', batch, 'train_CE:', train_CE, 'train_acc:', train_acc)
        print('epoch', epoch, 'batch', batch, 'val_CE:', val_CE, 'val_acc:', val_acc)

tb.on_train_end(None)

我可以看到日志文件已在 Google Colab 运行时中成功生成。如何在 Tensorboard 中查看?我已经看到了描述将日志文件下载到本地机器并在本地 tensorboard 中查看的解决方案,但这并没有显示任何内容。我的代码中是否缺少某些内容以允许它在本地的张量板上工作?和/或在 Google Colab 的 Tensorboard 中查看日志数据的替代解决方案?

如果它对解决方案的细节很重要,我在 Mac 上。此外,我在网上看到的教程展示了如何在使用 fit 代码时将 Tensorboard 与 Google Colab 一起使用,但看不到如何修改我不使用 fit 而是使用 train_on_batch 的代码。

【问题讨论】:

【参考方案1】:

感谢曼彻斯特城市大学的 Ryan Cunningham 博士解决了这个问题,具体如下:

%load_ext tensorboard
%tensorboard --logdir './Logs'

...这使我可以在 Google Colab 文档本身中查看 Tensorboard 图,并在 NN 训练时查看图更新。

因此,在网络训练时查看 Tensorboard 图的完整代码集是(在定义神经网络之后,我称之为 convnet):

# compile the neural net after defining the loss, optimisation and 
# performance metric
convnet.compile(loss='categorical_crossentropy',  # cross entropy is suited to 
                                                   # multi-class classification
                optimizer=tf.keras.optimizers.Adam(0.001),
                metrics=['accuracy']
               )

# create tensorboard graph data for the model
tb = tf.keras.callbacks.TensorBoard(log_dir='Logs/Exp_15', 
                                    histogram_freq=0, 
                                    batch_size=batch_size, 
                                    write_graph=True, 
                                    write_grads=False)
tb.set_model(convnet)

%load_ext tensorboard
%tensorboard --logdir './Logs'

# iterate through the training set for x epochs, 
# each time iterating through the batches,
# for each batch, train, calculate loss & optimise weights. 
# (mini-batch approach)
num_epochs = 1
batches_processed_counter = 0

for epoch in range(num_epochs):

    for batch in range(int(train_img.samples/batch_size)): 
        batches_processed_counter = batches_processed_counter  + 1

        # get next batch of images & labels
        X_imgs, X_labels = next(train_img) 

        #train model, get cross entropy & accuracy for batch
        train_CE, train_acc = convnet.train_on_batch(X_imgs, X_labels) 

        # validation images - just predict
        X_imgs_val, X_labels_val = next(val_img)
        val_CE, val_acc = convnet.test_on_batch(X_imgs_val, X_labels_val) 

        # create tensorboard graph info for the cross entropy loss and training accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, 'train_loss': train_CE, 'train_acc': train_acc)

        # create tensorboard graph info for the cross entropy loss and VALIDATION accuracies
        # for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
        tb.on_epoch_end(batches_processed_counter, 'val_loss': val_CE, 'val_acc': val_acc)

        print('epoch', epoch, 'batch', batch, 'train_CE:', train_CE, 'train_acc:', train_acc)
        print('epoch', epoch, 'batch', batch, 'val_CE:', val_CE, 'val_acc:', val_acc)

tb.on_train_end(None)


注意:在单元格运行完成后,单元格输出刷新并显示 Tensorboard 图可能需要几秒钟。

【讨论】:

【参考方案2】:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

get_ipython().system_raw('tensorboard --logdir /content/trainingdata/objectdetection/ckpt_output/trainingImatges/ --host 0.0.0.0 --port 6006 &')

get_ipython().system_raw('./ngrok http 6006 &')

! curl -s http://localhost:4040/api/tunnels | python3 -c \
 "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

这为您提供了来自创建的日志文件的张量板。这将为 colab 上的张量板创建一个隧道,并使其可以通过 ngrok 提供的公共 URL 访问。当您运行最终命令时,将打印公共 URL。它适用于 TF1.13 。我想你也可以对 TF2 使用相同的方法。

【讨论】:

谢谢 55597。如果不完全了解它的作用,我会觉得运行该代码很不舒服。当我有更多时间时,我会研究它以更好地理解它。我得到了一个看起来不那么可怕的替代解决方案: %load_ext tensorboard %tensorboard --logdir './Logs'

以上是关于如何查看 Google Colab 生成的 train_on_batch tensorboard 日志文件?的主要内容,如果未能解决你的问题,请参考以下文章

AI绘画如何使用Google Colab安装Stable Diffusion

Google colab查看gpu

如何在 Google Colab 中获得分配的 GPU 规格

GOOGLE COLAB 之TENSORBOARD 启用

在 Google colab 中解码 .xlsx 时遇到问题

使用 Google Colab 的免费 G​​PU 使用 CTRL 生成文本的 HuggingFace 变形金刚