TensorFlow Object Detection目标检测模型训练时Loss急剧上升直至为NAN

Posted 冰不语

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了TensorFlow Object Detection目标检测模型训练时Loss急剧上升直至为NAN相关的知识,希望对你有一定的参考价值。

今天用TensorFlow Object Detection训练目标检测模型的时候,Loss一直不正常,先下降了一下,然后急剧上升直至为NAN。然后报错Model diverged with loss = NaN.。截取部分日志如下:

INFO:tensorflow:loss = 18919772.0, step = 0
INFO:tensorflow:loss = 344412.66, step = 100 (27.579 sec)
INFO:tensorflow:loss = 156323.77, step = 200 (22.843 sec)
INFO:tensorflow:loss = 286260.12, step = 300 (22.834 sec)
INFO:tensorflow:loss = 7225620.0, step = 400 (22.840 sec)
INFO:tensorflow:loss = 35882144.0, step = 500 (22.831 sec)
INFO:tensorflow:loss = 11317121000.0, step = 600 (22.844 sec)
INFO:tensorflow:loss = 264382550000.0, step = 700 (22.859 sec)
INFO:tensorflow:loss = 2169563800000.0, step = 800 (22.870 sec)
INFO:tensorflow:loss = 49792570000000.0, step = 900 (22.838 sec)
INFO:tensorflow:loss = 279824520000000.0, step = 1000 (22.857 sec)
INFO:tensorflow:loss = 610852500000000.0, step = 1100 (22.872 sec)
INFO:tensorflow:loss = 8140467300000000.0, step = 1200 (22.867 sec)
INFO:tensorflow:loss = 1.5560248e+16, step = 1300 (22.864 sec)
ERROR:tensorflow:Model diverged with loss = NaN.

这情况一看就感觉像是个低级错误引起的,随手一查果然立刻恍然发现,config文件的num_classes的值忘记了修改
网上还看到有人说造成这种现象的原因还可能有:
label_map.pbtxt中的name和生成的tfrecord的类别名称不一致的
特此记录,以防万一。

以上是关于TensorFlow Object Detection目标检测模型训练时Loss急剧上升直至为NAN的主要内容,如果未能解决你的问题,请参考以下文章

如何安装 TensorFlow 2 和 object_detection 模块?

TensorFlow Object Detection API

TensorFlow object_detection 使用

TensorFlow object detection API

TensorFlow object detection API应用一

TensorFlow使用object detection训练并识别自己的模型