YOLO物体检测中如何获取边界框的坐标？

Posted 2023-02-16

技术标签:

【中文标题】YOLO物体检测中如何获取边界框的坐标？【英文标题】：How to get the coordinates of the bounding box in YOLO object detection? 【发布时间】：2017-11-16 14:06:50 【问题描述】：

我需要使用YOLO物体检测得到上图中生成的边界框坐标。

【问题讨论】：

【参考方案1】：

如果您在darknet 框架中使用yolov4（我的意思是直接从GitHub 存储库https://github.com/AlexeyAB/darknet 编译的版本）对静态图像运行对象检测，可以在以下位置运行类似以下命令获取边界框作为相对坐标的命令行：

.\darknet.exe detector test .\cfg\coco.data .\cfg\yolov4.cfg .\yolov4.weights -ext_output .\data\people1.jpg -out result.json

请注意，以上是 Windows 的语法，因此您可能必须将反斜杠更改为正斜杠，以便它在 macOS 或 Linux 操作系统上工作。另外，请在运行前确保路径准确。在命令中，输入的是根目录下的data 目录下的people1.jpg 文件。输出将存储在名为result.json 的文件中。随意修改此输出名称，但保留 .json 扩展名以更改其名称。

【讨论】：

是否可以按一定的时间间隔保存实时流结果。例如：10 秒。我认为应该可以通过修改类似这样的脚本来实现：github.com/IdoGalil/People-counting-system/blob/master/yolov3/…【参考方案2】：

如果接受的答案对您不起作用，这可能是因为您使用的是 AlexyAB's 暗网模型而不是 pjreddie's 暗网模型。

您只需转到 src 文件夹中的 image_opencv.cpp 文件并取消注释以下部分：

            ...

            //int b_x_center = (left + right) / 2;
            //int b_y_center = (top + bot) / 2;
            //int b_width = right - left;
            //int b_height = bot - top;
            //sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);

这将打印 Bbox 中心坐标以及 Bbox 的宽度和高度。进行更改后，请确保在运行 YOLO 之前再次make 暗网。

【讨论】：

非常感谢。这行得通。但我想打印如下：“

` sprintf("%s 的边界框：%d, %d", labelstr, b_x_center, b_y_center); `【参考方案3】：

对于windows中的python用户：

首先...，做几个设置工作：

在环境路径中设置暗网文件夹的 python 路径：

PYTHONPATH = 'YOUR DARKNET FOLDER'

通过添加将 PYTHONPATH 添加到路径值：

%PYTHONPATH%

在cfg folder 中编辑文件coco.data，将names 文件夹变量更改为您的coco.names 文件夹，在我的例子中：

names = D:/core/darknetAB/data/coco.names

使用此设置，您可以从任何文件夹调用 darknet.py（来自 alexeyAB\darknet 存储库）作为您的 python 模块。

开始编写脚本：

from darknet import performDetect as scan #calling 'performDetect' function from darknet.py

def detect(str):
    ''' this script if you want only want get the coord '''
    picpath = str
    cfg='D:/core/darknetAB/cfg/yolov3.cfg' #change this if you want use different config
    coco='D:/core/darknetAB/cfg/coco.data' #you can change this too
    data='D:/core/darknetAB/yolov3.weights' #and this, can be change by you
    test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False, initOnly=False) #default format, i prefer only call the result not to produce image to get more performance

    #until here you will get some data in default mode from alexeyAB, as explain in module.
    #try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))], 
    #to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):

    newdata = []
    if len(test) >=2:
        for x in test:
            item, confidence_rate, imagedata = x
            x1, y1, w_size, h_size = imagedata
            x_start = round(x1 - (w_size/2))
            y_start = round(y1 - (h_size/2))
            x_end = round(x_start + w_size)
            y_end = round(y_start + h_size)
            data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
            newdata.append(data)

    elif len(test) == 1:
        item, confidence_rate, imagedata = test[0]
        x1, y1, w_size, h_size = imagedata
        x_start = round(x1 - (w_size/2))
        y_start = round(y1 - (h_size/2))
        x_end = round(x_start + w_size)
        y_end = round(y_start + h_size)
        data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
        newdata.append(data)

    else:
        newdata = False

    return newdata

使用方法：

table = 'D:/test/image/test1.jpg'
checking = detect(table)'

获取坐标：

如果只有 1 个结果：

x1, y1, x2, y2 = checking[2]

如果有很多结果：

for x in checking:
    item = x[0]
    x1, y1, x2, y2 = x[2]
    print(item)
    print(x1, y1, x2, y2)

【讨论】：

代码未经测试，在 weight_size 和 height_size 中存在拼写错误。您应该使用 test[0] 在单次检测中提取 item、confidence_rate、imagedata。我在下面评论了工作代码。无论如何，非常感谢您的代码帮助我开始工作。是的...，抱歉错字...只是尝试帮助和启发...顺便说一句，已经修复了错字...现在应该可以工作...注意：最新OpenCV（4.1.1以上）已经有了Darknet RNN模型，所以，我们可以直接在opencv中实现darknet。 OpenCV 现在就像一台机器一样......【参考方案4】：

灵感来自上面的@Wahyu 回答。几乎没有更改、修改和错误修复，并通过单对象检测和多对象检测进行了测试。

# calling 'performDetect' function from darknet.py
from darknet import performDetect as scan
import math


def detect(img_path):
    ''' this script if you want only want get the coord '''
    picpath = img_path
    # change this if you want use different config
    cfg = '/home/saggi/Documents/saggi/prabin/darknet/cfg/yolo-obj.cfg'
    coco = '/home/saggi/Documents/saggi/prabin/darknet/obj.data'  # you can change this too
    # and this, can be change by you
    data = '/home/saggi/Documents/saggi/prabin/darknet/backup/yolo-obj_last.weights'
    test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False,
                initOnly=False)  # default format, i prefer only call the result not to produce image to get more performance

    # until here you will get some data in default mode from alexeyAB, as explain in module.
    # try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
    # to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):

    newdata = []

    # For multiple Detection
    if len(test) >= 2:
        for x in test:
            item, confidence_rate, imagedata = x
            x1, y1, w_size, h_size = imagedata
            x_start = round(x1 - (w_size/2))
            y_start = round(y1 - (h_size/2))
            x_end = round(x_start + w_size)
            y_end = round(y_start + h_size)
            data = (item, confidence_rate,
                    (x_start, y_start, x_end, y_end), (w_size, h_size))
            newdata.append(data)

    # For Single Detection
    elif len(test) == 1:
        item, confidence_rate, imagedata = test[0]
        x1, y1, w_size, h_size = imagedata
        x_start = round(x1 - (w_size/2))
        y_start = round(y1 - (h_size/2))
        x_end = round(x_start + w_size)
        y_end = round(y_start + h_size)
        data = (item, confidence_rate,
                (x_start, y_start, x_end, y_end), (w_size, h_size))
        newdata.append(data)

    else:
        newdata = False

    return newdata


if __name__ == "__main__":
    # Multiple detection image test
    # table = '/home/saggi/Documents/saggi/prabin/darknet/data/26.jpg'
    # Single detection image test
    table = '/home/saggi/Documents/saggi/prabin/darknet/data/1.jpg'
    detections = detect(table)

    # Multiple detection
    if len(detections) > 1:
        for detection in detections:
            print(' ')
            print('========================================================')
            print(' ')
            print('All Parameter of Detection: ', detection)

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected label: ', detection[0])

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected object Confidence: ', detection[1])

            x1, y1, x2, y2 = detection[2]
            print(' ')
            print('========================================================')
            print(' ')
            print(
                'Detected object top left and bottom right cordinates (x1,y1,x2,y2):  x1, y1, x2, y2')
            print('x1: ', x1)
            print('y1: ', y1)
            print('x2: ', x2)
            print('y2: ', y2)

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected object width and height: ', detection[3])
            b_width, b_height = detection[3]
            print('Weidth of bounding box: ', math.ceil(b_width))
            print('Height of bounding box: ', math.ceil(b_height))
            print(' ')
            print('========================================================')

    # Single detection
    else:
        print(' ')
        print('========================================================')
        print(' ')
        print('All Parameter of Detection: ', detections)

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected label: ', detections[0][0])

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected object Confidence: ', detections[0][1])

        x1, y1, x2, y2 = detections[0][2]
        print(' ')
        print('========================================================')
        print(' ')
        print(
            'Detected object top left and bottom right cordinates (x1,y1,x2,y2):  x1, y1, x2, y2')
        print('x1: ', x1)
        print('y1: ', y1)
        print('x2: ', x2)
        print('y2: ', y2)

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected object width and height: ', detections[0][3])
        b_width, b_height = detections[0][3]
        print('Weidth of bounding box: ', math.ceil(b_width))
        print('Height of bounding box: ', math.ceil(b_height))
        print(' ')
        print('========================================================')

# Single detections output:
# test value  [('movie_name', 0.9223029017448425, (206.79859924316406, 245.4672393798828, 384.83673095703125, 72.8630142211914))]

# Multiple detections output:
# test value  [('movie_name', 0.9225175976753235, (92.47076416015625, 224.9121551513672, 147.2491912841797, 42.063255310058594)),
#  ('movie_name', 0.4900225102901459, (90.5261459350586, 12.4061279296875, 182.5990447998047, 21.261077880859375))]

【讨论】：

你怎么不用锚点？ @Pe Dro，请阅读我上面答案中的部分。有一个解释它是如何工作的，它仍然使用锚，具有绑定方法。并使其正常工作，需要进行一些我在回答中已经解释过的配置...【参考方案5】：

如果你打算在python 中实现它，我在here 中创建了这个小的python 包装器。按照ReadMe 文件进行安装。它会很容易安装。

之后按照example code 了解如何检测对象。如果你的检测是det

top_left_x = det.bbox.x
top_left_y = det.bbox.y
width = det.bbox.w
height = det.bbox.h

如果需要，可以通过以下方式获取中点：

mid_x, mid_y = det.bbox.get_point(pyyolo.BBox.Location.MID)

希望这会有所帮助..

【讨论】：

【参考方案6】：

一个快速的解决方案是修改image.c文件以打印出边界框信息：

...
if(bot > im.h-1) bot = im.h-1;

// Print bounding box values 
printf("Bounding Box: Left=%d, Top=%d, Right=%d, Bottom=%d\n", left, top, right, bot); 
draw_box_width(im, left, top, right, bot, width, red, green, blue);
...

【讨论】：

说真的，非常感谢您建议 image.c。它帮助我解决了一个完全不同的问题：在 Python 中（通过 OpenCV-DNN）运行 YOLO 时，检测结果以浮点格式给出。从字面上看，我所见过的每一篇文章都有将 YOLO 浮点数（中心 X/Y 和宽度/高度）转换为像素坐标的错误数学。但是官方 image.c 有数学！就在这儿！ github.com/pjreddie/darknet/blob/… - 我只需要将它移植到 python。 :-) @Brian O'Donnell 如何修改“image.c”以仅获取边界框坐标的四个数字（无需任何额外说明）？您只想要数字吗？如果是这样，您会想要： printf("%d,%d,%d,%d\n", left, top, right, bot); @MitchMcMabers 你知道为什么需要乘以宽度和高度吗？

以上是关于YOLO物体检测中如何获取边界框的坐标？的主要内容，如果未能解决你的问题，请参考以下文章

在 Yolo v3 Darknet 中测量检测到的对象的 X、Y、Z 坐标

TensorFlow对象检测API教程中获取边界框坐标

经典论文解读YOLO 目标检测

如何在 Tensorflow 对象检测 API 中查找边界框坐标

经典论文解读YOLO 目标检测

如何使用 Python 将边界框坐标转换为 Yolo 坐标？