如何查看 Google Colab 生成的 train_on_batch tensorboard 日志文件?
Posted
技术标签:
【中文标题】如何查看 Google Colab 生成的 train_on_batch tensorboard 日志文件?【英文标题】:How to view train_on_batch tensorboard log files generated by Google Colab? 【发布时间】:2020-06-23 22:36:26 【问题描述】:我知道如何在我的本地机器上查看张量板图,同时我的神经网络使用本地 Jupyter Notebook 中的代码进行训练,使用以下代码。当我使用 Google Colab 来训练神经网络时,我需要做些什么不同的事情?使用 train_on_batch 时,我在网上看不到任何教程/示例。
定义我的模型(convnet)之后...
convnet.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=['accuracy']
)
# create tensorboard graph data for the model
tb = tf.keras.callbacks.TensorBoard(log_dir='Logs/Exp_15',
histogram_freq=0,
batch_size=batch_size,
write_graph=True,
write_grads=False)
tb.set_model(convnet)
num_epochs = 3
batches_processed_counter = 0
for epoch in range(num_epochs):
for batch in range(int(train_img.samples/batch_size)):
batches_processed_counter = batches_processed_counter + 1
# get next batch of images & labels
X_imgs, X_labels = next(train_img)
#train model, get cross entropy & accuracy for batch
train_CE, train_acc = convnet.train_on_batch(X_imgs, X_labels)
# validation images - just predict
X_imgs_val, X_labels_val = next(val_img)
val_CE, val_acc = convnet.test_on_batch(X_imgs_val, X_labels_val)
# create tensorboard graph info for the cross entropy loss and training accuracies
# for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
tb.on_epoch_end(batches_processed_counter, 'train_loss': train_CE, 'train_acc': train_acc)
# create tensorboard graph info for the cross entropy loss and VALIDATION accuracies
# for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
tb.on_epoch_end(batches_processed_counter, 'val_loss': val_CE, 'val_acc': val_acc)
print('epoch', epoch, 'batch', batch, 'train_CE:', train_CE, 'train_acc:', train_acc)
print('epoch', epoch, 'batch', batch, 'val_CE:', val_CE, 'val_acc:', val_acc)
tb.on_train_end(None)
我可以看到日志文件已在 Google Colab 运行时中成功生成。如何在 Tensorboard 中查看?我已经看到了描述将日志文件下载到本地机器并在本地 tensorboard 中查看的解决方案,但这并没有显示任何内容。我的代码中是否缺少某些内容以允许它在本地的张量板上工作?和/或在 Google Colab 的 Tensorboard 中查看日志数据的替代解决方案?
如果它对解决方案的细节很重要,我在 Mac 上。此外,我在网上看到的教程展示了如何在使用 fit
代码时将 Tensorboard 与 Google Colab 一起使用,但看不到如何修改我不使用 fit
而是使用 train_on_batch
的代码。
【问题讨论】:
【参考方案1】:感谢曼彻斯特城市大学的 Ryan Cunningham 博士解决了这个问题,具体如下:
%load_ext tensorboard
%tensorboard --logdir './Logs'
...这使我可以在 Google Colab 文档本身中查看 Tensorboard 图,并在 NN 训练时查看图更新。
因此,在网络训练时查看 Tensorboard 图的完整代码集是(在定义神经网络之后,我称之为 convnet):
# compile the neural net after defining the loss, optimisation and
# performance metric
convnet.compile(loss='categorical_crossentropy', # cross entropy is suited to
# multi-class classification
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=['accuracy']
)
# create tensorboard graph data for the model
tb = tf.keras.callbacks.TensorBoard(log_dir='Logs/Exp_15',
histogram_freq=0,
batch_size=batch_size,
write_graph=True,
write_grads=False)
tb.set_model(convnet)
%load_ext tensorboard
%tensorboard --logdir './Logs'
# iterate through the training set for x epochs,
# each time iterating through the batches,
# for each batch, train, calculate loss & optimise weights.
# (mini-batch approach)
num_epochs = 1
batches_processed_counter = 0
for epoch in range(num_epochs):
for batch in range(int(train_img.samples/batch_size)):
batches_processed_counter = batches_processed_counter + 1
# get next batch of images & labels
X_imgs, X_labels = next(train_img)
#train model, get cross entropy & accuracy for batch
train_CE, train_acc = convnet.train_on_batch(X_imgs, X_labels)
# validation images - just predict
X_imgs_val, X_labels_val = next(val_img)
val_CE, val_acc = convnet.test_on_batch(X_imgs_val, X_labels_val)
# create tensorboard graph info for the cross entropy loss and training accuracies
# for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
tb.on_epoch_end(batches_processed_counter, 'train_loss': train_CE, 'train_acc': train_acc)
# create tensorboard graph info for the cross entropy loss and VALIDATION accuracies
# for every batch in every epoch (so if 5 epochs and 10 batches there should be 50 accuracies )
tb.on_epoch_end(batches_processed_counter, 'val_loss': val_CE, 'val_acc': val_acc)
print('epoch', epoch, 'batch', batch, 'train_CE:', train_CE, 'train_acc:', train_acc)
print('epoch', epoch, 'batch', batch, 'val_CE:', val_CE, 'val_acc:', val_acc)
tb.on_train_end(None)
注意:在单元格运行完成后,单元格输出刷新并显示 Tensorboard 图可能需要几秒钟。
【讨论】:
【参考方案2】:!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
get_ipython().system_raw('tensorboard --logdir /content/trainingdata/objectdetection/ckpt_output/trainingImatges/ --host 0.0.0.0 --port 6006 &')
get_ipython().system_raw('./ngrok http 6006 &')
! curl -s http://localhost:4040/api/tunnels | python3 -c \
"import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
这为您提供了来自创建的日志文件的张量板。这将为 colab 上的张量板创建一个隧道,并使其可以通过 ngrok 提供的公共 URL 访问。当您运行最终命令时,将打印公共 URL。它适用于 TF1.13 。我想你也可以对 TF2 使用相同的方法。
【讨论】:
谢谢 55597。如果不完全了解它的作用,我会觉得运行该代码很不舒服。当我有更多时间时,我会研究它以更好地理解它。我得到了一个看起来不那么可怕的替代解决方案: %load_ext tensorboard %tensorboard --logdir './Logs'以上是关于如何查看 Google Colab 生成的 train_on_batch tensorboard 日志文件?的主要内容,如果未能解决你的问题,请参考以下文章
AI绘画如何使用Google Colab安装Stable Diffusion
如何在 Google Colab 中获得分配的 GPU 规格