TensorFlow 检测 API 中的 SSD 锚点

Posted 2023-02-23

技术标签:

【中文标题】TensorFlow 检测 API 中的 SSD 锚点【英文标题】：SSD anchors in Tensorflow detection API 【发布时间】：2019-01-15 11:45:13 【问题描述】：

我想在 N × N 图像的自定义数据集上训练 SSD 检测器。于是我挖了Tensorflow object detection API，在COCO上找到了一个基于MobileNet v2的SSD300x300预训练模型。

查看用于训练的配置文件时：anchor_generator 字段如下所示：（在论文之后）

anchor_generator 
  ssd_anchor_generator 
    num_layers: 6
    min_scale: 0.2
    max_scale: 0.9
    aspect_ratios: 1.0
    aspect_ratios: 2.0
    aspect_ratios: 0.5
    aspect_ratios: 3.0
    aspect_ratios: 0.33

在查看SSD anchor generator proto 时，我是否正确地假设：base_anchor_height=base_anchor_width=1？

如果是，我假设通过阅读Multiple Grid anchors generator（如果图像是 300x300 正方形）得到的锚点是：尺寸范围从 0.2300=6060 像素到 0.9300=270270 像素（具有不同的纵横比）？

因此，如果想通过固定字段来训练 NxN 图像：

fixed_shape_resizer 
  height: N
  width: N

他会使用相同的配置文件锚点，范围从 (0.2N,0.2N) 像素到 (0.9N,0.9N) 像素（具有不同的纵横比)?

我做了很多假设，因为代码很难掌握，而且似乎还几乎没有文档。我对么？有没有一种简单的方法可以在不训练模型的情况下可视化使用的锚点？

【问题讨论】：

【参考方案1】：

这里有一些函数可用于在不训练模型的情况下生成和可视化锚框坐标。我们在这里所做的只是调用训练/推理期间图表中使用的相关操作。

首先，我们需要知道构成给定尺寸输入图像的对象检测层的特征图的分辨率（形状）。

import tensorflow as tf 
from object_detection.anchor_generators.multiple_grid_anchor_generator import create_ssd_anchors
from object_detection.models.ssd_mobilenet_v2_feature_extractor_test import SsdMobilenetV2FeatureExtractorTest

def get_feature_map_shapes(image_height, image_width):
    """
    :param image_height: height in pixels
    :param image_width: width in pixels
    :returns: list of tuples containing feature map resolutions
    """
    feature_extractor = SsdMobilenetV2FeatureExtractorTest()._create_feature_extractor(
        depth_multiplier=1,
        pad_to_multiple=1,
    )
    image_batch_tensor = tf.zeros([1, image_height, image_width, 1])
    
    return [tuple(feature_map.get_shape().as_list()[1:3])
            for feature_map in feature_extractor.extract_features(image_batch_tensor)]

这将返回特征图形状列表，例如[(19,19), (10,10), (5,5), (3,3), (2,2), (1,1)]，您可以将其传递给返回锚框坐标的第二个函数。

def get_feature_map_anchor_boxes(feature_map_shape_list, **anchor_kwargs):
    """
    :param feature_map_shape_list: list of tuples containing feature map resolutions
    :returns: dict with feature map shape tuple as key and list of [ymin, xmin, ymax, xmax] box co-ordinates
    """
    anchor_generator = create_ssd_anchors(**anchor_kwargs)

    anchor_box_lists = anchor_generator.generate(feature_map_shape_list)
    
    feature_map_boxes = 

    with tf.Session() as sess:
        for shape, box_list in zip(feature_map_shape_list, anchor_box_lists):
            feature_map_boxes[shape] = sess.run(box_list.data['boxes'])
            
    return feature_map_boxes

在您的示例中，您可以这样称呼它：

boxes = get_feature_map_boxes(
    min_scale=0.2,
    max_scale=0.9,
    feature_map_shape_list=get_feature_map_shapes(300, 300)
)

您不需要指定纵横比，因为您的配置中的纵横比与 create_ssd_anchors 的默认值相同。

最后，我们在反映给定层分辨率的网格上绘制锚框。请注意，模型中锚框和预测框的坐标在 0 和 1 之间进行了归一化。

def draw_boxes(boxes, figsize, nrows, ncols, grid=(0,0)):

    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) 

    for ax, box in zip(axes.flat, boxes):
        ymin, xmin, ymax, xmax = box
        ax.add_patch(patches.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin, 
                                fill=False, edgecolor='red', lw=2))

        # add gridlines to represent feature map cells
        ax.set_xticks(np.linspace(0, 1, grid[0] + 1), minor=True)
        ax.set_yticks(np.linspace(0, 1, grid[1] + 1), minor=True)
        ax.grid(True, which='minor', axis='both')
              
    fig.tight_layout()
    
    return fig

如果我们以具有 3x3 特征图的第四层为例

draw_boxes(feature_map_boxes[(3,3)], figsize=(12,16), nrows=9, ncols=6, grid=(3,3))

在上图中，每一行代表 3x3 特征图中的不同单元格，而每一列代表一个特定的纵横比。

您最初的假设是正确的，例如，最高层（具有最低分辨率特征图）中纵横比为 1.0 的锚框的高度/宽度等于输入图像大小的 0.9，而最低层中的锚框将具有等于输入图像大小的 0.2 的高度/宽度。中间层的锚点大小在这些限制之间线性插值。

但是，有关 TensorFlow 锚点生成的一些微妙之处值得关注：

interpolated_scale_aspect_ratio

reduce_boxes_in_lowest_layer

base_anchor_height = base_anchor_width = 1

完整的要点可以在here找到。

【讨论】：

看起来不错！很快就会检查出来！非常感谢@macloudy 的精彩回答。 @macloudy 我们如何从上图中减少锚框的数量？

anchor_generator    ssd_anchor_generator      num_layers: 6     min_scale: 0.2     max_scale: 0.9     aspect_ratios: 1.0     aspect_ratios: 2.0     aspect_ratios: 0.5     aspect_ratios: 3.0     aspect_ratios: 0.33

我只需要两个纵横比为 0.5 和 0.33 的锚框。减少num_layers = 2 会出错。用你的脚本，我总共得到 (38, 38): 7308 个盒子....你知道为什么它不匹配 8732 吗？ (38, 38): 4332, (19, 19): 2166, (10, 10): 600, (5, 5): 150, (3, 3): 54, (1, 1): 6 8732 应该是 SSD 论文中提到的先验数。物体检测tensorflow框架只给7308github.com/tensorflow/models/blob/master/research/…

以上是关于TensorFlow 检测 API 中的 SSD 锚点的主要内容，如果未能解决你的问题，请参考以下文章