使用 k-means 聚类生成 SSD 锚框纵横比

Posted 2022-02-21 想游泳的鱼

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了使用 k-means 聚类生成 SSD 锚框纵横比相关的知识，希望对你有一定的参考价值。

（这篇文章是TensorFlow Object_detection API 框架中的一篇，用来训练自己模型锚框。）
许多对象检测模型使用锚框作为区域采样策略，因此在训练期间，模型学习将几个预定义的锚框之一与临近实况边界框匹配。为了优化对象检测模型的准确性和效率，调整这些锚框以适合您的模型数据集会很有帮助，因为 TensorFlow 的经过训练的检查点附带的配置文件包括旨在覆盖非常广泛的对象集的纵横比.
因此在这篇文章中学习如何发现一组为您的数据集定制的纵横比，正如通过所有真实边界框比率的 k 均值聚类发现的那样。
对于演示目的，我们使用的是 PETS 数据集（猫和狗）的子集，它与其他一些模型训练教程（例如 Edge TPU 的教程）相匹配，但您可以将此脚本与不同的数据集一起使用，我们将展示如何对其进行调整以满足模型的目标，包括如何优化速度而不是准确性或准确性而不是速度。
此notebook的结果是一个新的管道 .config 文件，您可以将其复制到模型训练脚本中。使用新的自定义锚框配置，您应该观察到更快的训练管道和略微提高的模型准确性。

安装所需的库

# Install the tensorflow Object Detection API...
# If you're running this offline, you also might need to install the protobuf-compiler:
#   apt-get install protobuf-compiler

! git clone -n https://github.com/tensorflow/models.git
%cd models
!git checkout 461b3587ef38b42cda151fa3b7d37706d77e4244
%cd research
! protoc object_detection/protos/*.proto --python_out=.

# Install TensorFlow Object Detection API
%cp object_detection/packages/tf2/setup.py .
! python -m pip install --upgrade pip
! python -m pip install --use-feature=2020-resolver .

# Test the installation
! python object_detection/builders/model_builder_tf2_test.py

准备数据

尽管此笔记本不执行模型训练，但您需要在此处使用训练模型时使用的相同数据集。

要找到最佳的锚框比率，您应该使用所有训练数据集（或尽可能多的数据集）。这是因为，正如在介绍中提到的，您想要测量您希望模型遇到的图像的精确种类——少了一点，并且锚框可能无法覆盖您模型遇到的各种对象，因此它可能具有较弱的准确性。（而另一种方法，其中比率基于超出模型应用范围的数据，通常会创建一个效率低下的模型，其准确性也可能较低。）

%mkdir /content/dataset
%cd /content/dataset
! wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
! wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
! tar zxf images.tar.gz
! tar zxf annotations.tar.gz

XML_PATH = '/content/dataset/annotations/xmls'

因为下面的 k-means 脚本将处理所有 XML 注释，我们希望减少 PETS 数据集，只包括用于训练模型的猫和狗（在这个训练笔记本中）。所以我们删除所有不是阿比西尼亚或美国斗牛犬的注释文件

! (cd /content/dataset/annotations/xmls/ && \\
  find . ! \\( -name 'Abyssinian*' -o -name 'american_bulldog*' \\) -type f -exec rm -f  \\; )

上传自己的数据

要为您自己的数据集生成锚框比率，请上传带有注释文件的 ZIP 文件（单击左侧的文件选项卡，然后将 ZIP 文件拖放到那里），然后取消注释以下代码以将其解压缩并指定注释文件所在目录的路径：

# %cd /content/
# !unzip dataset.zip

# XML_PATH = '/content/dataset/annotations/xmls'

使用k-means找到最佳的锚框比率

我们试图找到一组与数据集中大多数对象形状重叠的纵横比。我们通过查找数据集的边界框的共同集群来做到这一点，使用 k-means 聚类算法来查找这些集群的质心。

为了解决这个问题，我们需要计算以下内容：

给定边界框的 k-means 聚类质心（参见下面的 kmeans_aspect_ratios() 函数）。
具有给定纵横比的边界框的平均交集。（参见下面的 average_iou() 函数）。这不会影响最终框比的结果，但可以作为一个有用的指标来决定所选框是否有效以及是否要尝试更多/更少的纵横比。（我们将在下面详细讨论这个分数。）

注意：此处使用的术语“质心”是指 k-means 集群的中心（框（高度、宽度）向量）。

import sys
import os
import numpy as np
import xml.etree.ElementTree as ET

from sklearn.cluster import KMeans

def xml_to_boxes(path, rescale_width=None, rescale_height=None):
  """Extracts bounding-box widths and heights from ground-truth dataset.

  Args:
  path : Path to .xml annotation files for your dataset.
  rescale_width : Scaling factor to rescale width of bounding box.
  rescale_height : Scaling factor to rescale height of bounding box.

  Returns:
  bboxes : A numpy array with pairs of box dimensions as [width, height].
  """

  xml_list = []
  filenames = os.listdir(os.path.join(path))
  filenames = [os.path.join(path, f) for f in filenames if (f.endswith('.xml'))]
  for xml_file in filenames:
    tree = ET.parse(xml_file)
    root = tree.getroot()
    for member in root.findall('object'):
      bndbox = member.find('bndbox')
      bbox_width = int(bndbox.find('xmax').text) - int(bndbox.find('xmin').text)
      bbox_height = int(bndbox.find('ymax').text) - int(bndbox.find('ymin').text)
      if rescale_width and rescale_height:
        size = root.find('size')
        bbox_width = bbox_width * (rescale_width / int(size.find('width').text))
        bbox_height = bbox_height * (rescale_height / int(size.find('height').text))
      xml_list.append([bbox_width, bbox_height])
  bboxes = np.array(xml_list)
  return bboxes


def average_iou(bboxes, anchors):
    """Calculates the Intersection over Union (IoU) between bounding boxes and
    anchors.

    Args:
    bboxes : Array of bounding boxes in [width, height] format.
    anchors : Array of aspect ratios [n, 2] format.

    Returns:
    avg_iou_perc : A Float value, average of IOU scores from each aspect ratio
    """
    intersection_width = np.minimum(anchors[:, [0]], bboxes[:, 0]).T
    intersection_height = np.minimum(anchors[:, [1]], bboxes[:, 1]).T

    if np.any(intersection_width == 0) or np.any(intersection_height == 0):
        raise ValueError("Some boxes have zero size.")

    intersection_area = intersection_width * intersection_height
    boxes_area = np.prod(bboxes, axis=1, keepdims=True)
    anchors_area = np.prod(anchors, axis=1, keepdims=True).T
    union_area = boxes_area + anchors_area - intersection_area
    avg_iou_perc = np.mean(np.max(intersection_area / union_area, axis=1)) * 100

    return avg_iou_perc

def kmeans_aspect_ratios(bboxes, kmeans_max_iter, num_aspect_ratios):
  """Calculate the centroid of bounding boxes clusters using Kmeans algorithm.

  Args:
  bboxes : Array of bounding boxes in [width, height] format.
  kmeans_max_iter : Maximum number of iterations to find centroids.
  num_aspect_ratios : Number of centroids to optimize kmeans.

  Returns:
  aspect_ratios : Centroids of cluster (optmised for dataset).
  avg_iou_prec : Average score of bboxes intersecting with new aspect ratios.
  """

  assert len(bboxes), "You must provide bounding boxes"

  normalized_bboxes = bboxes / np.sqrt(bboxes.prod(axis=1, keepdims=True))
  
  # Using kmeans to find centroids of the width/height clusters
  kmeans = KMeans(
      init='random', n_clusters=num_aspect_ratios, random_state=0, max_iter=kmeans_max_iter)
  kmeans.fit(X=normalized_bboxes)
  ar = kmeans.cluster_centers_

  assert len(ar), "Unable to find k-means centroid, try increasing kmeans_max_iter."

  avg_iou_perc = average_iou(normalized_bboxes, ar)

  if not np.isfinite(avg_iou_perc):
    sys.exit("Failed to get aspect ratios due to numerical errors in k-means")

  aspect_ratios = [w/h for w,h in ar]

  return aspect_ratios, avg_iou_perc

在下一个代码块中，我们将调用上述函数来发现理想的锚框纵横比。

您可以调整以下参数以适合您的性能目标。

最重要的是，您应该考虑要生成的纵横比的数量。在决策范围的两端，您可能会寻求两个目标：

低准确率高推理速度 使用2~3个纵横比
程序的准确度或置信度得分在 80% 左右/以下。
平均 IOU 分数（来自 avg_iou_perc）将在 70-85 左右。
这减少了模型在推理过程中的整体计算量，从而使推理速度更快。
高准确率低推理速度使用5~6个纵横比
程序的准确度或置信度得分在 90% 左右/以下。
平均 IOU 分数（来自 avg_iou_perc）超过95
这增加了模型在推理过程中的整体计算量，从而使推理速度更慢。

下面的初始配置介于两者之间：它搜索 4 个纵横比。

# Tune this based on your accuracy/speed goals as described above
num_aspect_ratios = 4 # can be [2,3,4,5,6]

# Tune the iterations based on the size and distribution of your dataset
# You can check avg_iou_prec every 100 iterations to see how centroids converge
kmeans_max_iter = 500

# These should match the training pipeline config ('fixed_shape_resizer' param)
width = 320
height = 320

# Get the ground-truth bounding boxes for our dataset
bboxes = xml_to_boxes(path=XML_PATH, rescale_width=width, rescale_height=height)

aspect_ratios, avg_iou_perc =  kmeans_aspect_ratios(
                                      bboxes=bboxes,
                                      kmeans_max_iter=kmeans_max_iter,
                                      num_aspect_ratios=num_aspect_ratios)

aspect_ratios = sorted(aspect_ratios)

print('Aspect ratios generated:', [round(ar,2) for ar in aspect_ratios])
print('Average IOU with anchors:', avg_iou_perc)

产生新的配置文件

现在我们只需要模型开始时使用的 .config 文件，我们会将新的 ssd_anchor_generator 属性合并到其中。

import tensorflow as tf
from google.protobuf import text_format
from object_detection.protos import pipeline_pb2

pipeline = pipeline_pb2.TrainEvalPipelineConfig()
config_path = '/content/models/research/object_detection/samples/configs/ssdlite_mobiledet_edgetpu_320x320_coco_sync_4x4.config'
pipeline_save = '/content/ssdlite_mobiledet_edgetpu_320x320_custom_aspect_ratios.config'
with tf.io.gfile.GFile(config_path, "r") as f:
    proto_str = f.read()
    text_format.Merge(proto_str, pipeline)
pipeline.model.ssd.num_classes = 2
while pipeline.model.ssd.anchor_generator.ssd_anchor_generator.aspect_ratios:
  pipeline.model.ssd.anchor_generator.ssd_anchor_generator.aspect_ratios.pop()

for i in range(len(aspect_ratios)):
  pipeline.model.ssd.anchor_generator.ssd_anchor_generator.aspect_ratios.append(aspect_ratios[i])

config_text = text_format.MessageToString(pipeline)
with tf.io.gfile.GFile(pipeline_save, "wb") as f:
    f.write(config_text)
# Check for updated aspect ratios in the config
!cat /content/ssdlite_mobiledet_edgetpu_320x320_custom_aspect_ratios.config

总结和后续步骤

如果您查看上面打印的新 .config 文件，您会发现 anchor_generator 规范，其中包括我们使用上面的 k-means 代码生成的新 aspect_ratio 值。

原始配置文件（ssdlite_mobileet_edgetpu_320x320_coco_sync_4x4.config）确实已经有一些默认的锚框纵横比，但我们已经用针对我们的数据集优化的值替换了这些值。这些新的锚框应该提高模型的准确性（与默认锚相比）并加快训练过程。

如果您想使用此配置来训练模型，请查看retrain MobileDet for the Coral Edge TPU,，它使用这个精确的猫/狗数据集。只需复制上面打印的 .config 文件并将其添加到该notebook中。（或者从 Colab UI 左侧的文件面板下载文件：它名为 ssdlite_mobileet_edgetpu_320x320_custom_aspect_ratios.config。）

有关管道配置文件的更多信息，请阅读Configuring the Object Detection Training Pipeline。

关于 anchor scales…

这个notebook本专注于锚框纵横比，因为这通常是每个数据集最难调整的。但是您还应该考虑锚框比例的不同配置，这些配置指定不同锚框尺寸的数量及其最小/最大尺寸——这会影响您的模型检测不同尺寸对象的能力。

通过估计您希望模型在应用程序环境中遇到的最小/最大尺寸，手动调整锚比例更容易。就像上面选择纵横比的数量时一样，不同盒子大小的数量也会影响你的模型精度和速度（使用更多的盒子比例更准确，但也更慢）。

您还可以在Configuring the Object Detection Training Pipeline阅读更多关于锚比例的信息。

以上是关于使用 k-means 聚类生成 SSD 锚框纵横比的主要内容，如果未能解决你的问题，请参考以下文章