使用 toco 进行 tflite 转换中的“尺寸必须匹配”错误

Posted 2023-03-27

技术标签:

【中文标题】使用 toco 进行 tflite 转换中的“尺寸必须匹配”错误【英文标题】："Dimensions must match" error in tflite conversion with toco 【发布时间】：2019-09-30 06:56:41 【问题描述】：

我有一个使用 TensorFlow 估计器训练的自定义 CNN 模型（图像分类器），我将在转换为 TensorFlowLite 模型后在 ios 应用程序中使用它。

我的模型有几个 dropout 层，还有批量标准化层。为了避免转换错误并移除optimize_for_inference 过程中的那些dropout层，我已经在检查点文件旁边单独保存了eval_graph.pbtxt，以便在freeze_graph 中使用它。

在freeze_graph 中一切正常，optimize_for_inference 也不会抛出任何错误。但是，在将冻结模型和优化模型文件（均 .pb）导入 tensorboard 进行检查后，我发现：

优化前的冻结模型

优化后的模型

似乎optimize_for_inference 删除了输入张量层的形状信息，如果我冻结模型并在训练模式下保存图形（默认graph.pbtxt）并对其进行优化，情况并非如此。

环境：

Tensorflow 1.8.0 用于训练； Tensorflow 1.13.1 用于转换；

代码如下：

model_fn 的摘录，很正常：

def cnn_model_fn(features, labels, mode, params):
    """Model function for CNN."""
    # Input Layer, images aleady reshaped before feed in;
    net = tf.placeholder_with_default(
        features['Pixels'],
        (None, 48, 48, 1),
        name='input_tensor'
    )

    # bn-1
    net = tf.layers.batch_normalization(
        inputs=net,
        training=mode == tf.estimator.ModeKeys.TRAIN
    )

    # conv2d-1
    net = tf.layers.conv2d(
        inputs=net,
        filters=32,
        kernel_size=[3, 3],
        padding='same',
        activation=tf.nn.relu
    )

    # conv2ds, dropouts, poolings, bns...

    # CONV2Ds -> DENSEs
    # 48 pixels pooled three times (kernel_sizes=2, strides=2), and final conv2d has 128 neurons;
    net = tf.reshape(net, [-1, 6 * 6 * 128])

    # bn-4
    net = tf.layers.batch_normalization(
        inputs=net,
        training=mode == tf.estimator.ModeKeys.TRAIN
    )

    # dense-1
    net = tf.layers.dense(
        inputs=net,
        units=256,
        kernel_regularizer=keras.regularizers.l2(0.001),
        activation=tf.nn.relu
    )

    # denses, logits, nothing special...

    # In prediction:
    if mode == tf.estimator.ModeKeys.PREDICT:        
        return tf.estimator.EstimatorSpec(...)

    # In evaluation:
    if mode == tf.estimator.ModeKeys.EVAL:
        # hook for saving graph in eval mode, this graph will be used in freezing & optimizing process;
        eval_finish_hook = EvalFinishHook()
        eval_finish_hook.model_dir = params['model_dir']
        return tf.estimator.EstimatorSpec(
            ...,
            evaluation_hooks=[eval_finish_hook]
        )

    # In training:
    if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(...)

和自定义 eval 钩子类：

class EvalFinishHook(tf.train.SessionRunHook):
    model_dir = '.'
    _saver = None

    def begin(self):
        self._saver = tf.train.Saver()
        super().begin()

    def end(self, session):
        dst_dir = self.model_dir + 'eval_ckpt'
        self._saver.save(sess=session, save_path=dst_dir + '/eval.ckpt')
        tf.train.write_graph(session.graph.as_graph_def(), dst_dir, 'eval_graph.pbtxt')
        super().end(session)

冻结和优化：

# freeze graph
echo "freezing checkpoint $best_step..."
freeze_graph \
--input_graph=$input_graph \
--input_checkpoint=$input_checkpoint \
--input_binary=false \
--output_graph=$frozen_model \
--output_node_names=$output_names \

# optimize for inference
echo "optimizing..."
/path/to/bazel-bin/tensorflow/python/tools/optimize_for_inference \
--input=$frozen_model \
--output=$optimized_model \
--frozen_graph=True \
--input_names=$input_names \
--output_names=$output_names

toco 抛出错误：

# convert to tflite
echo "converting..."
toco \
--graph_def_file=$optimized_model \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--inference_type=FLOAT \
--input_type=FLOAT \
--input_arrays=$input_names \
--output_arrays=$output_names \
--input_shapes=1,48,48,1 \
--output_file=$tflite_model


# error info
Check failed: dim_x == dim_y (128 vs. 4608)Dimensions must match

这个错误似乎是合理的，因为形状的等级 1 和 2 都是未知的。

为什么？

【问题讨论】：

【参考方案1】：

optimize_for_inference 从图中随机删除 dropout 层，通常在输入上使用 dropout。因此答案可能是肯定的。

bazel-bin/tensorflow/python/tools/optimize_for_inference \ 
--input=/tf_files/retrained_graph.pb \ 
--output=/tf_files/optimized_graph.pb \ 
--input_names= \ 
--output_names=result

让我们尝试使用 RandomUniform、FLOOR、TensorFlowShape、TensorFlowSwitch、TensorFlowMerge 自定义实现，以禁用错误。

参考：Dropout Regularization

【讨论】：

我尝试过冻结和优化保存在eval和train模式下的图，并将两个优化后的模型导入tensorboard进行检查，唯一的区别是训练图的input_tensor有输出形状，而eval 图之一根本没有输出形状。顺便说一句，我可以毫无问题地从训练好的图形中获取 tflite；但是，从评估图中，始终得到Dimensions must match。有什么想法吗？【参考方案2】：

当你使用froze graph而不是graph.pbtxt时，你应该使用eval.pbtxt。

请检查tensorflow/models

所以让我们将第一个维度的“无”替换为零，其余的替换描述向量/矩阵的大小。还有一点是要遵守矩阵乘法规则，即第一个操作数的列数必须与第二个操作数的行数匹配。

如果对你有帮助，请采纳。

【讨论】：

是的，我做到了。这就是我得到那些Dimensions must match 错误的地方。 @keyOfVv 我编辑了我的答案，请再次检查。【参考方案3】：

好吧，似乎交换 bn-4 和 dense-1 消除了错误。因此，在这种情况下，批量标准化应该在密集之后进行（例如，在 conv2d->dense reshape 之后）。

【讨论】：

【参考方案4】：

是的，应该在dense之后：

model.add(Dense(.., ..))
model.add(BatchNormalization())
model.add(Activation(...))
model.add(Dropout(...))

【讨论】：

以上是关于使用 toco 进行 tflite 转换中的“尺寸必须匹配”错误的主要内容，如果未能解决你的问题，请参考以下文章