训练 yolov3 时出错:- ValueError: tf.function-decorated 函数试图在非第一次调用时创建变量

Posted

技术标签:

【中文标题】训练 yolov3 时出错:- ValueError: tf.function-decorated 函数试图在非第一次调用时创建变量【英文标题】:getting error while training yolov3 :- ValueError: tf.function-decorated function tried to create variables on non-first call 【发布时间】:2021-03-18 02:53:54 【问题描述】:

我正在训练一个自定义 yolov3 模型并收到错误“ValueError:tf.function-decorated function 试图在非第一次调用时创建变量。”同时拟合模型进行训练。 fit_generator 语句出错。有人可以帮忙吗?

train_generator = BatchGenerator(
        instances           = train_ints, 
        anchors             = config['model']['anchors'],   
        labels              = labels,        
        downsample          = 32, # ratio between network input's size and network output's size, 32 for YOLOv3
        max_box_per_image   = max_box_per_image,
        batch_size          = config['train']['batch_size'],
        min_net_size        = config['model']['min_input_size'],
        max_net_size        = config['model']['max_input_size'],   
        shuffle             = True, 
        jitter              = 0.3, 
        norm                = normalize
    )


    train_model, infer_model = create_model(
        nb_class            = len(labels), 
        anchors             = config['model']['anchors'], 
        max_box_per_image   = max_box_per_image, 
        max_grid            = [config['model']['max_input_size'], config['model']['max_input_size']], 
        batch_size          = config['train']['batch_size'], 
        warmup_batches      = warmup_batches,
        ignore_thresh       = config['train']['ignore_thresh'],
        multi_gpu           = multi_gpu,
        saved_weights_name  = config['train']['saved_weights_name'],
        lr                  = config['train']['learning_rate'],
        grid_scales         = config['train']['grid_scales'],
        obj_scale           = config['train']['obj_scale'],
        noobj_scale         = config['train']['noobj_scale'],
        xywh_scale          = config['train']['xywh_scale'],
        class_scale         = config['train']['class_scale'],
    )

    ###############################
    #   Kick off the training
    ###############################
    callbacks = create_callbacks(config['train']['saved_weights_name'], config['train']['tensorboard_dir'], infer_model)
    print ("before kickoff", len(train_generator))
    print ("before kickoff", train_generator)
    **train_model.fit_generator(
        generator        = train_generator,** 
        steps_per_epoch  = len(train_generator) * config['train']['train_times'], 
        epochs           = config['train']['nb_epochs'] + config['train']['warmup_epochs'],
        #epochs           = 1, 
        verbose          = 2 if config['train']['debug'] else 1,
        callbacks        = callbacks, 
        workers          = 2,
        max_queue_size   = 8
    )
    print ("after kickoff")                   

我得到的错误是:

警告:tensorflow:模型无法序列化为 JSON。忽略... YoloLayer 层在__init__ 中有参数,因此必须覆盖get_config。 纪元 1/21 回溯(最近一次通话最后): 文件“train.py”,第 300 行,在 主要(参数) main 中的文件“train.py”,第 269 行 train_model.fit_generator( 文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py”,第 324 行,在 new_func 返回函数(*args,**kwargs) 文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py”,第 1815 行,在 fit_generator 返回 self.fit( _method_wrapper 中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py”,第 108 行 返回方法(自我,*args,**kwargs) 文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py”,第 1098 行,适合 tmp_logs = train_function(迭代器) 调用中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py”,第 780 行 结果 = self._call(*args, **kwds) _call 中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py”,第 840 行 return self._stateless_fn(*args, **kwds) 调用中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py”,第 2828 行 图函数,args,kwargs = self._maybe_define_function(args,kwargs) _maybe_define_function 中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py”,第 3213 行 graph_function = self._create_graph_function(args, kwargs) _create_graph_function 中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py”,第 3065 行 func_graph_module.func_graph_from_py_func( func_graph_from_py_func 中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py”,第 986 行 func_outputs = python_func(*func_args, **func_kwargs) 文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py”,第 600 行,位于 Wrapped_fn return weak_wrapped_fn().wrapped(*args, **kwds) 包装器中的文件“/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py”,第 973 行 引发 e.ag_error_metadata.to_exception(e) ValueError:在用户代码中:

/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:806 train_function  *
    return step_function(self, iterator)
/Users/karthikeyan/Desktop/table/yolo.py:46 call  *
    batch_seen = tf.Variable(0.)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/variables.py:262 __call__  **
    return cls._variable_v2_call(*args, **kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/variables.py:244 _variable_v2_call
    return previous_getter(
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/variables.py:67 getter
    return captured_getter(captured_previous, **kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2857 creator
    return next_creator(**kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/variables.py:67 getter
    return captured_getter(captured_previous, **kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2857 creator
    return next_creator(**kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/variables.py:67 getter
    return captured_getter(captured_previous, **kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2857 creator
    return next_creator(**kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/variables.py:67 getter
    return captured_getter(captured_previous, **kwargs)
/Users/karthikeyan/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py:701 invalid_creator_scope
    raise ValueError(

ValueError: tf.function-decorated function tried to create variables on non-first call.              

【问题讨论】:

【参考方案1】:

我能找到答案。在 import tensorflow 后加入“tf.config.experimental_run_functions_eagerly(True)”这个语句就解决了问题。

【讨论】:

以上是关于训练 yolov3 时出错:- ValueError: tf.function-decorated 函数试图在非第一次调用时创建变量的主要内容,如果未能解决你的问题,请参考以下文章

yolov3训练自己数据时遇到的问题(cannot load image)

yolov3训练

yolov3 MNN框架部署C++版

YOLOv3训练自己的数据

[OpenCV实战]8 深度学习目标检测网络YOLOv3的训练

Yolov3代码分析与训练自己数据集