由于未加载渐变,微调 SavedModel 失败
Posted
技术标签:
【中文标题】由于未加载渐变,微调 SavedModel 失败【英文标题】:Finetune SavedModel Failure due to No Gradient loaded 【发布时间】:2020-10-06 02:24:59 【问题描述】:更新:查看我自己对这个问题的回答。这是tensorflow Efficientnet的一个bug
我想做什么
我想微调效率网。首先,我成功完成了训练并保存了一个模型。它由一个冻结的高效网络和全连接层组成。我使用SavedModel
格式保存它(参见train.py)。然后,在微调阶段(参见finetune.py),我尝试加载SavedModel
,但加载失败。
问题
我无法成功加载和重新训练包含 Efficientnet 的 SavedModel
。
我已经尝试过什么
我尝试了load_model
和load_weights
,但都没有帮助。有谁知道该怎么做? GradientTape 不与 SavedMmodel 一起使用?我应该使用load_model
或load_weights
以外的其他东西吗?
环境 macOS:10.15.6 张量流==2.3.1
日志输出
... (a very long line of something like this below)
WARNING:tensorflow:Importing a function (__inference_my_model_layer_call_and_return_conditional_losses_3683150) with ops with custom gradients. Will likely fail if a gradient is requested.
ail if a gradient is requested.
WARNING:tensorflow:Importing a function (__inference_my_model_layer_call_and_return_conditional_losses_3683150) with ops with custom gradients. Will likely fail if a gradient is requested.
...
File "finetune.py", line 90, in <module>
_train_loss = train_step(train_images, train_labels).numpy()
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 823, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
*args, **kwds))
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
return weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
raise e.ag_error_metadata.to_exception(e)
tensorflow.python.autograph.impl.api.StagingError: in user code:
finetune.py:54 train_step *
gradients = tape.gradient(loss, model.trainable_variables)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py:1073 gradient **
unconnected_gradients=unconnected_gradients)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/imperative_grad.py:77 imperative_grad
compat.as_str(unconnected_gradients.value))
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:797 _backward_function
return self._rewrite_forward_and_call_backward(call_op, *args)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:712 _rewrite_forward_and_call_backward
forward_function, backwards_function = self.forward_backward(len(doutputs))
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:621 forward_backward
forward, backward = self._construct_forward_backward(num_doutputs)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:669 _construct_forward_backward
func_graph=backwards_graph)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:986 func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:659 _backprop_function
src_graph=self._func_graph)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:669 _GradientsHelper
lambda: grad_fn(op, *out_grads))
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:336 _MaybeCompile
return grad_fn() # Exit early
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:669 <lambda>
lambda: grad_fn(op, *out_grads))
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:712 _rewrite_forward_and_call_backward
forward_function, backwards_function = self.forward_backward(len(doutputs))
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:621 forward_backward
forward, backward = self._construct_forward_backward(num_doutputs)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:669 _construct_forward_backward
func_graph=backwards_graph)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:986 func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:659 _backprop_function
src_graph=self._func_graph)
/Users/a/my_awesome_project/.venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:623 _GradientsHelper
(op.name, op.type))
LookupError: No gradient defined for operation 'efficientnetb0/top_activation/IdentityN' (op type: IdentityN)
源代码
train.py
import datetime
import os
import tensorflow as tf
from myutils import decode_jpg # defined in another module
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.base_model = tf.keras.applications.EfficientNetB0(
input_shape=(256, 256, 3),
include_top=False,
weights='imagenet')
self.base_model.trainable = False # unfreeze at finetuning stage later
self.global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
self.prediction_layer = tf.keras.layers.Dense(200)
def call(self, x):
x = self.base_model(x)
x = self.global_average_layer(x)
x = self.prediction_layer(x)
return x
model = MyModel()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
data = tf.data.Dataset.list_files('./data/*/*.jpg').batch(128).map(decode_jpg)
for epoch in range(100):
for images, labels in data:
train_step(images, labels).
model.save('saved_models/'.format(epoch + 1))
finetune.py(我为最小化复制进行了重构,所以错误日志中的行号不匹配)
import datetime
import os
import tensorflow as tf
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
self.base_model = tf.keras.applications.EfficientNetB0(
input_shape=(256, 256, 3),
include_top=False,
weights='imagenet'
)
self.base_model.trainable = True
self.global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
self.prediction_layer = tf.keras.layers.Dense(200)
def call(self, x):
x = self.base_model(x)
x = self.global_average_layer(x)
x = self.prediction_layer(x)
return x
# model = MyModel()
# model.load_weights('./saved_models/65') ValueError: Unable to load weights saved in HDF5 format into a subclassed Model which has not created its variables yet. Call the Model first, then load the weights.
model = tf.keras.models.load_model('./saved_models/65') # This way ends up error message above
model.get_layer('efficientnetb0').trainable = True
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(lr=1e-5)
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images, training=True)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
EPOCHS = 100
data = tf.data.Dataset.list_files('./data/*/*.jpg').batch(128).map(decode_jpg)
for epoch in range(EPOCHS):
for images, labels in data:
train_step(train_images, train_labels)
model.save('finetuned/'.format(epoch + 1))
我尝试在 Colab 上重现,但看到了不同的错误消息 https://colab.research.google.com/drive/1gzOwSWJ1Kvwzo01SEpjqGq6Lb-OsI-ob?usp=sharing
现在我在 tensorflow/tensorflow 存储库上提出了问题。 https://github.com/tensorflow/tensorflow/issues/43806
【问题讨论】:
这与 EfficientNet 有关,因为替换为 ResNet50 不会显示错误 【参考方案1】:我回答我自己的问题。这是 EfficientNet 错误。关注这个问题: https://github.com/tensorflow/tensorflow/issues/40166
【讨论】:
以上是关于由于未加载渐变,微调 SavedModel 失败的主要内容,如果未能解决你的问题,请参考以下文章