YOLOMac上YOLO: Real-Time Object Detection

Posted 2022-11-25 Taily老段

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了YOLOMac上YOLO: Real-Time Object Detection相关的知识，希望对你有一定的参考价值。

官网链接：https://pjreddie.com/darknet/yolov2/

YOLO网络详解：https://zhuanlan.zhihu.com/p/25236464（知乎）

YOLO是一名叫做Joseph Chet Redmon的大神与他的几个小伙伴做的一个开源实时物体检测系统。Joseph的几个小伙伴来自于华盛顿大学以及伯克利大学等知名院校。

YOLO是基于他们开发的Darknet（基于c语言的神经网络开源框架）上的一个应用系统，不同于前置检测系统将一副图像的不同位置以及维度分别进行分类预测，YOLO将整幅图像输入进单一的神经网络进行分类预测。这使得YOLO相对于其它的物体检测网络更加的快速。

其在 VOC 2007（Visual Object Class Challenge 2007）上的 mAP（Mean Average Precision）为78.6%，在 COCO test-dev 上为 48.1%

You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Titan X it processes images at 40-90 FPS and has a mAP on VOC 2007 of 78.6% and a mAP of 48.1% on COCO test-dev.

神经网络

Tiny YOLO 的架构是很简单的，它就是一个卷积神经网络：（见下面Tiny YOLO）

Layer         kernel  stride  output shape
---------------------------------------------
Input                          (416, 416, 3)
Convolution    3×3      1      (416, 416, 16)
MaxPooling     2×2      2      (208, 208, 16)
Convolution    3×3      1      (208, 208, 32)
MaxPooling     2×2      2      (104, 104, 32)
Convolution    3×3      1      (104, 104, 64)
MaxPooling     2×2      2      (52, 52, 64)
Convolution    3×3      1      (52, 52, 128)
MaxPooling     2×2      2      (26, 26, 128)
Convolution    3×3      1      (26, 26, 256)
MaxPooling     2×2      2      (13, 13, 256)
Convolution    3×3      1      (13, 13, 512)
MaxPooling     2×2      1      (13, 13, 512)
Convolution    3×3      1      (13, 13, 1024)
Convolution    3×3      1      (13, 13, 1024)
Convolution    1×1      1      (13, 13, 125)
---------------------------------------------

这种神经网络只使用了标准的层类型：3x3 核心的卷积层和 2x2 的最大值池化层，没有复杂的事务。YOLOv2 中没有全连接层。

YOLO与R-CNN系列的对比：

YOLO vs SSD

Darknet

https://pjreddie.com/darknet/install/

git clone https://github.com/pjreddie/darknet
cd darknet
make

You already have the config file for YOLO in the cfg/ subdirectory. You will have to download the pre-trained weight file here (258 MB). Or just run this:

wget https://pjreddie.com/media/files/yolov2.weights

Then run the detector!

./darknet detect cfg/yolov2.cfg yolov2.weights data/dog.jpg

You will see some output like this:

bogon:darknet taily$ ./darknet detect cfg/yolov2.cfg yolov2.weights data/dog.jpg
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs
    1 max          2 x 2 / 2   608 x 608 x  32   ->   304 x 304 x  32
    2 conv     64  3 x 3 / 1   304 x 304 x  32   ->   304 x 304 x  64  3.407 BFLOPs
    3 max          2 x 2 / 2   304 x 304 x  64   ->   152 x 152 x  64
    4 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128  3.407 BFLOPs
    5 conv     64  1 x 1 / 1   152 x 152 x 128   ->   152 x 152 x  64  0.379 BFLOPs
    6 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128  3.407 BFLOPs
    7 max          2 x 2 / 2   152 x 152 x 128   ->    76 x  76 x 128
    8 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
    9 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   10 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   11 max          2 x 2 / 2    76 x  76 x 256   ->    38 x  38 x 256
   12 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   13 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   14 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   15 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   16 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   17 max          2 x 2 / 2    38 x  38 x 512   ->    19 x  19 x 512
   18 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   19 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   20 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   21 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   22 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   23 conv   1024  3 x 3 / 1    19 x  19 x1024   ->    19 x  19 x1024  6.814 BFLOPs
   24 conv   1024  3 x 3 / 1    19 x  19 x1024   ->    19 x  19 x1024  6.814 BFLOPs
   25 route  16
   26 conv     64  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x  64  0.095 BFLOPs
   27 reorg              / 2    38 x  38 x  64   ->    19 x  19 x 256
   28 route  27 24
   29 conv   1024  3 x 3 / 1    19 x  19 x1280   ->    19 x  19 x1024  8.517 BFLOPs
   30 conv    425  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 425  0.314 BFLOPs
   31 detection
mask_scale: Using default '1.000000'
Loading weights from yolov2.weights...Done!
data/dog.jpg: Predicted in 10.404302 seconds.
dog: 82%
truck: 64%
bicycle: 85%

data/person.jpg: Predicted in 10.458536 seconds.
horse: 82%
person: 86%
dog: 86%

Tiny YOLO

Tiny YOLO is based off of the Darknet reference network and is much faster but less accurate than the normal YOLO model. To use the version trained on VOC:

wget https://pjreddie.com/media/files/yolov2-tiny-voc.weights
./darknet detector test cfg/voc.data cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights data/dog.jpg

Which, ok, it's not perfect, but boy it sure is fast. On GPU it runs at >200 FPS.

Tiny YOLO在精度上降低了，但速度提高了很多；

bogon:darknet taily$ ./darknet detector test cfg/voc.data cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights data/dog.jpg
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024  3.190 BFLOPs
   14 conv    125  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 125  0.043 BFLOPs
   15 detection
mask_scale: Using default '1.000000'
Loading weights from yolov2-tiny-voc.weights...Done!
data/dog.jpg: Predicted in 1.120719 seconds.
dog: 78%
car: 55%
car: 50%

YOLO V3

获取图像检测训练模型 
git clone https://github.com/pjreddie/darknet
编译 
cd darknet 
make
获取训练模型权重 
wget https://pjreddie.com/media/files/yolov3.weights
测试 
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

以上是关于YOLOMac上YOLO: Real-Time Object Detection的主要内容，如果未能解决你的问题，请参考以下文章

YOLO(You Only Look Once):Real-Time Object Detection

【目标检测】YOLO论文详解（You Only Look Once: Unified, Real-Time Object Detection）

第三十五节，目标检测之YOLO算法详解

YOLO 检测多张图片并保存标签信息

论文阅读：You Only Look Once: Unified, Real-Time Object Detection

使用 GPU=1 编译 Yolo (Darknet) 时出现错误 127 -(obj/convolutioanl_kernels.o)