加载图像以进行检测的更有效方式

Posted 2023-05-07

技术标签:

【中文标题】加载图像以进行检测的更有效方式【英文标题】：More efficient way of loading images for detection 【发布时间】：2017-08-17 00:50:09 【问题描述】：

我正在使用 tensorflow 对象检测 api 来执行一些半实时对象检测任务。图像将以 2 张图像/秒的速度由相机拍摄。每张图片将被裁剪成 4 张小图片，因此我总共需要处理 8 张图片/秒。

我的检测模型已导出到冻结图（.pb 文件）并加载到 GPU 内存中。然后我将图像加载到 numpy 数组中以将它们输入到我的模型中。

检测本身只需要大约 0.1 秒/张图像，但是，加载每张图像大约需要 0.45 秒。

我使用的脚本是根据对象检测 api(link) 提供的代码示例修改的，它读取每个图像并将它们转换为 numpy 数组，然后输入到检测模型中。这个过程最耗时的部分是load_image_into_numpy_array，大约需要0.45秒。

脚本如下：

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import timeit
import scipy.misc


from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image


from utils import label_map_util

from utils import visualization_utils as vis_util

# Path to frozen detection graph. This is the actual model that is used for the
# object detection.
PATH_TO_CKPT = 'animal_detection.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'animal_label_map.pbtxt')

NUM_CLASSES = 1


detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def,name='')

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map,
                                                            max_num_classes=NUM_CLASSES,
                                                            use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
    (im_width, im_height) = image.size
    return np.array(image.getdata()).reshape(
        (im_height, im_width, 3)).astype(np.uint8)

# For the sake of simplicity we will use only 2 images:
    # image1.jpg
    # image2.jpg
    # If you want to test the code with your images, just add path to the
    # images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test'
TEST_IMAGE_PATHS = [
    os.path.join(PATH_TO_TEST_IMAGES_DIR,'image.png'.format(i)) for i in range(1, 10) ]

    # Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
config = tf.ConfigProto()
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
with detection_graph.as_default():
  with tf.Session(graph=detection_graph, config=config) as sess:
    for image_path in TEST_IMAGE_PATHS:
      start = timeit.default_timer()
      image = Image.open(image_path)
      # the array based representation of the image will be used later in order to prepare the
      # result image with boxes and labels on it.
      image_np = load_image_into_numpy_array(image)
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      end = timeit.default_timer()
      print(end-start)
      start = timeit.default_timer()
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict=image_tensor: image_np_expanded)
      stop = timeit.default_timer()
      print (stop - start)
      # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
         image_np,
         np.squeeze(boxes),
         np.squeeze(classes).astype(np.int32),
         np.squeeze(scores),
         category_index,
         use_normalized_coordinates=True,
         line_thickness=2)

我正在考虑一种更有效的方法来加载由相机生成的图像，首先想到的是避免使用 numpy 数组并尝试使用 tensorflow 原生方式来加载图像，但我不知道从哪里开始，因为我我对 tensorflow 很陌生。

如果我能找到一些 tensorflow 方法来加载图像，也许我可以将 4 张图像分成 1 批并将它们输入到我的模型中，这样我的速度可能会有所提高。

一个不成熟的想法是尝试将从 1 个原始图像裁剪的 4 个小图像保存到一个 tf_record 文件中，并将 tf_record 文件作为一批加载到模型中，但我不知道如何实现。

任何帮助将不胜感激。

【问题讨论】：

【参考方案1】：

我找到了一种可以将图像加载时间从 0.4 秒减少到 0.01 秒的解决方案。如果有人也有同样的问题，我会在这里发布答案。我们可以在 opencv 中使用 imread，而不是使用 PIL.Image 和 numpy。我还设法批处理图像，以便我们可以实现更好的加速。

脚本如下：

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tensorflow as tf
import timeit
import cv2


from collections import defaultdict

from utils import label_map_util

from utils import visualization_utils as vis_util

MODEL_PATH = sys.argv[1]
IMAGE_PATH = sys.argv[2]
BATCH_SIZE = int(sys.argv[3])
# Path to frozen detection graph. This is the actual model that is used for the
# object detection.
PATH_TO_CKPT = os.path.join(MODEL_PATH, 'frozen_inference_graph.pb')

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'animal_label_map.pbtxt')

NUM_CLASSES = 1

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def,name='')

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map,
                                                            max_num_classes=NUM_CLASSES,
                                                            use_display_name=True)
category_index = label_map_util.create_category_index(categories)

PATH_TO_TEST_IMAGES_DIR = IMAGE_PATH
TEST_IMAGE_PATHS = [
    os.path.join(PATH_TO_TEST_IMAGES_DIR,'image.png'.format(i)) for i in range(1, 129) ]

config = tf.ConfigProto()
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
with detection_graph.as_default():
  with tf.Session(graph=detection_graph, config=config) as sess:
    for i in range(0, len(TEST_IMAGE_PATHS), BATCH_SIZE):
        images = []
        start = timeit.default_timer()
        for j in range(0, BATCH_SIZE):
            image = cv2.imread(TEST_IMAGE_PATHS[i+j])
            image = np.expand_dims(image, axis=0)
            images.append(image)
            image_np_expanded = np.concatenate(images, axis=0)
        image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
        # Each box represents a part of the image where a particular object was detected.
        boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
        # Each score represent how level of confidence for each of the objects.
        # Score is shown on the result image, together with the class label.
        scores = detection_graph.get_tensor_by_name('detection_scores:0')
        classes = detection_graph.get_tensor_by_name('detection_classes:0')
        num_detections = detection_graph.get_tensor_by_name('num_detections:0')
        # Actual detection.
        (boxes, scores, classes, num_detections) = sess.run(
            [boxes, scores, classes, num_detections],
            feed_dict=image_tensor: image_np_expanded)
        stop = timeit.default_timer()
        print (stop - start)

【讨论】：

以上是关于加载图像以进行检测的更有效方式的主要内容，如果未能解决你的问题，请参考以下文章