YOLOMac上YOLO: Real-Time Object Detection
Posted Taily老段
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了YOLOMac上YOLO: Real-Time Object Detection相关的知识,希望对你有一定的参考价值。
官网链接:https://pjreddie.com/darknet/yolov2/
YOLO网络详解:https://zhuanlan.zhihu.com/p/25236464(知乎)
YOLO是一名叫做Joseph Chet Redmon的大神与他的几个小伙伴做的一个开源实时物体检测系统。Joseph的几个小伙伴来自于华盛顿大学以及伯克利大学等知名院校。
YOLO是基于他们开发的Darknet(基于c语言的神经网络开源框架)上的一个应用系统,不同于前置检测系统将一副图像的不同位置以及维度分别进行分类预测,YOLO将整幅图像输入进单一的神经网络进行分类预测。这使得YOLO相对于其它的物体检测网络更加的快速。
其在 VOC 2007(Visual Object Class Challenge 2007)上的 mAP(Mean Average Precision)为78.6%,在 COCO test-dev 上为 48.1%
You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Titan X it processes images at 40-90 FPS and has a mAP on VOC 2007 of 78.6% and a mAP of 48.1% on COCO test-dev.
神经网络
Tiny YOLO 的架构是很简单的,它就是一个卷积神经网络:(见下面Tiny YOLO)
Layer kernel stride output shape
---------------------------------------------
Input (416, 416, 3)
Convolution 3×3 1 (416, 416, 16)
MaxPooling 2×2 2 (208, 208, 16)
Convolution 3×3 1 (208, 208, 32)
MaxPooling 2×2 2 (104, 104, 32)
Convolution 3×3 1 (104, 104, 64)
MaxPooling 2×2 2 (52, 52, 64)
Convolution 3×3 1 (52, 52, 128)
MaxPooling 2×2 2 (26, 26, 128)
Convolution 3×3 1 (26, 26, 256)
MaxPooling 2×2 2 (13, 13, 256)
Convolution 3×3 1 (13, 13, 512)
MaxPooling 2×2 1 (13, 13, 512)
Convolution 3×3 1 (13, 13, 1024)
Convolution 3×3 1 (13, 13, 1024)
Convolution 1×1 1 (13, 13, 125)
---------------------------------------------
这种神经网络只使用了标准的层类型:3x3 核心的卷积层和 2x2 的最大值池化层,没有复杂的事务。YOLOv2 中没有全连接层。
YOLO与R-CNN系列的对比:
YOLO vs SSD
Darknet
https://pjreddie.com/darknet/install/
git clone https://github.com/pjreddie/darknet
cd darknet
make
You already have the config file for YOLO in the cfg/
subdirectory. You will have to download the pre-trained weight file here (258 MB). Or just run this:
wget https://pjreddie.com/media/files/yolov2.weights
Then run the detector!
./darknet detect cfg/yolov2.cfg yolov2.weights data/dog.jpg
You will see some output like this:
bogon:darknet taily$ ./darknet detect cfg/yolov2.cfg yolov2.weights data/dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BFLOPs
1 max 2 x 2 / 2 608 x 608 x 32 -> 304 x 304 x 32
2 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64 3.407 BFLOPs
3 max 2 x 2 / 2 304 x 304 x 64 -> 152 x 152 x 64
4 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128 3.407 BFLOPs
5 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64 0.379 BFLOPs
6 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128 3.407 BFLOPs
7 max 2 x 2 / 2 152 x 152 x 128 -> 76 x 76 x 128
8 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
9 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
10 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
11 max 2 x 2 / 2 76 x 76 x 256 -> 38 x 38 x 256
12 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
13 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
14 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
15 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
16 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
17 max 2 x 2 / 2 38 x 38 x 512 -> 19 x 19 x 512
18 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
19 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
20 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
21 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
22 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
23 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024 6.814 BFLOPs
24 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024 6.814 BFLOPs
25 route 16
26 conv 64 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 64 0.095 BFLOPs
27 reorg / 2 38 x 38 x 64 -> 19 x 19 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 19 x 19 x1280 -> 19 x 19 x1024 8.517 BFLOPs
30 conv 425 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 425 0.314 BFLOPs
31 detection
mask_scale: Using default '1.000000'
Loading weights from yolov2.weights...Done!
data/dog.jpg: Predicted in 10.404302 seconds.
dog: 82%
truck: 64%
bicycle: 85%
data/person.jpg: Predicted in 10.458536 seconds.
horse: 82%
person: 86%
dog: 86%
Tiny YOLO
Tiny YOLO is based off of the Darknet reference network and is much faster but less accurate than the normal YOLO model. To use the version trained on VOC:
wget https://pjreddie.com/media/files/yolov2-tiny-voc.weights
./darknet detector test cfg/voc.data cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights data/dog.jpg
Which, ok, it's not perfect, but boy it sure is fast. On GPU it runs at >200 FPS.
Tiny YOLO在精度上降低了,但速度提高了很多;
bogon:darknet taily$ ./darknet detector test cfg/voc.data cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights data/dog.jpg
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125 0.043 BFLOPs
15 detection
mask_scale: Using default '1.000000'
Loading weights from yolov2-tiny-voc.weights...Done!
data/dog.jpg: Predicted in 1.120719 seconds.
dog: 78%
car: 55%
car: 50%
YOLO V3
获取图像检测训练模型
git clone https://github.com/pjreddie/darknet
编译
cd darknet
make
获取训练模型权重
wget https://pjreddie.com/media/files/yolov3.weights
测试
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
以上是关于YOLOMac上YOLO: Real-Time Object Detection的主要内容,如果未能解决你的问题,请参考以下文章
YOLO(You Only Look Once):Real-Time Object Detection
【目标检测】YOLO论文详解(You Only Look Once: Unified, Real-Time Object Detection)
论文阅读:You Only Look Once: Unified, Real-Time Object Detection
使用 GPU=1 编译 Yolo (Darknet) 时出现错误 127 -(obj/convolutioanl_kernels.o)