TensorFlow对象检测API教程中获取边界框坐标
Posted
技术标签:
【中文标题】TensorFlow对象检测API教程中获取边界框坐标【英文标题】:Get the bounding box coordinates in the TensorFlow object detection API tutorial 【发布时间】:2018-08-01 13:42:03 【问题描述】:我是 Python 和 Tensorflow 的新手。我正在尝试从Tensorflow Object Detection API 运行对象检测教程文件, 但是当检测到对象时,我找不到在哪里可以获得边界框的坐标。
相关代码:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
我假设绘制边界框的地方是这样的:
# Visualization of the results of detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
我尝试打印output_dict['detection_boxes']
,但我不确定这些数字的含义。有很多。
array([[ 0.56213236, 0.2780568 , 0.91445708, 0.69120586],
[ 0.56261235, 0.86368728, 0.59286624, 0.8893863 ],
[ 0.57073039, 0.87096912, 0.61292225, 0.90354401],
[ 0.51422435, 0.78449738, 0.53994244, 0.79437423],
......
[ 0.32784131, 0.5461576 , 0.36972913, 0.56903434],
[ 0.03005961, 0.02714229, 0.47211722, 0.44683522],
[ 0.43143299, 0.09211366, 0.58121657, 0.3509962 ]], dtype=float32)
我找到了类似问题的答案,但我没有像他们那样有一个名为 box 的变量。如何获取坐标?
【问题讨论】:
【参考方案1】:我尝试打印 output_dict['detection_boxes'] 但我不确定是什么 数字的意思
您可以自己查看代码。 visualize_boxes_and_labels_on_image_array
定义为 here。
请注意,您传递的是use_normalized_coordinates=True
。如果您跟踪函数调用,您将看到您的数字[ 0.56213236, 0.2780568 , 0.91445708, 0.69120586]
等是图像坐标处的值[ymin, xmin, ymax, xmax]
:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width,
ymin * im_height, ymax * im_height)
由函数计算:
def draw_bounding_box_on_image(image,
ymin,
xmin,
ymax,
xmax,
color='red',
thickness=4,
display_str_list=(),
use_normalized_coordinates=True):
"""Adds a bounding box to an image.
Bounding box coordinates can be specified in either absolute (pixel) or
normalized coordinates by setting the use_normalized_coordinates argument.
Each string in display_str_list is displayed on a separate line above the
bounding box in black text on a rectangle filled with the input 'color'.
If the top of the bounding box extends to the edge of the image, the strings
are displayed below the bounding box.
Args:
image: a PIL.Image object.
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
if use_normalized_coordinates:
(left, right, top, bottom) = (xmin * im_width, xmax * im_width,
ymin * im_height, ymax * im_height)
【讨论】:
好的。似乎 output_dict['detection_boxes'] 包含所有重叠的框,这就是为什么有这么多数组的原因。谢谢! 是什么决定了有多少重叠框?还有为什么会有这么多重叠的框,为什么要传到可视化层去合并? 我知道这是一个老问题,但我认为这可能会对某人有所帮助。如果在visualize_boxes_and_labels_on_image_array
函数输入变量中增加min_score_thresh
,则可以限制重叠框的数量。默认情况下,它设置为0.5
,例如,对于我的项目,我不得不将其增加到0.8
。
标准化的 bbox 格式为 - ymin, xmin, ymax, xmax
github.com/tensorflow/models/blob/…【参考方案2】:
我也有同样的故事。当图像上只显示一个时,得到一个包含大约一百个框 (output_dict['detection_boxes']
) 的数组。深入挖掘绘制矩形的代码能够提取并在我的inference.py
中使用:
#so detection has happened and you've got output_dict as a
# result of your inference
# then assume you've got this in your inference.py in order to draw rectangles
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8)
# This is the way I'm getting my coordinates
boxes = output_dict['detection_boxes']
# get all boxes from an array
max_boxes_to_draw = boxes.shape[0]
# get scores to get a threshold
scores = output_dict['detection_scores']
# this is set as a default but feel free to adjust it to your needs
min_score_thresh=.5
# iterate over all objects found
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
#
if scores is None or scores[i] > min_score_thresh:
# boxes[i] is the box which will be drawn
class_name = category_index[output_dict['detection_classes'][i]]['name']
print ("This box is gonna get used", boxes[i], output_dict['detection_classes'][i])
【讨论】:
【参考方案3】:上面的答案对我不起作用,我不得不做一些改变。所以如果这没有帮助,不妨试试这个。
# This is the way I'm getting my coordinates
boxes = detections['detection_boxes'].numpy()[0]
# get all boxes from an array
max_boxes_to_draw = boxes.shape[0]
# get scores to get a threshold
scores = detections['detection_scores'].numpy()[0]
# this is set as a default but feel free to adjust it to your needs
min_score_thresh=.5
# # iterate over all objects found
coordinates = []
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
if scores[i] > min_score_thresh:
class_id = int(detections['detection_classes'].numpy()[0][i] + 1)
coordinates.append(
"box": boxes[i],
"class_name": category_index[class_id]["name"],
"score": scores[i]
)
print(coordinates)
这里的坐标列表中的每一项(字典)都是一个要在图像上绘制的框,带有框坐标(标准化)、class_name 和 score。
【讨论】:
以上是关于TensorFlow对象检测API教程中获取边界框坐标的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Tensorflow 对象检测 API 中查找边界框坐标
我应该包含 Tensorflow 对象检测 API 的负面示例吗?