Progress and Prospect of target detection technology based on deep learning

Posted 2020-12-27 hugeng007

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Progress and Prospect of target detection technology based on deep learning相关的知识，希望对你有一定的参考价值。

Viola-Jones face detector

技术分享图片

One of the more successful examples of object detection in the whole computer field is the Viola-Jones face detector that appeared around 2000,which makes it a more mature technique compared to the object detection.The basic idea of this method is sliding window type,using a fixed size window to slide in the input image,and the window frame will be sent to the classifier to judge whether it is a face window or a non human face window.The size of the sliding window is fixed,but the size of the face is varied.In size of the sliding is fixed,but the size of the face is varied.In order to detect the different size of the face,it is necessary to scale the input image to different sizes,so that the face of different sizes,so that the face of different sizes can match the size of the window at a certain scale.One obvious problem with this sliding window approach is that there are too many problem with this sliding window approach is that there are too many places to check to determine whether the face is human or not.
Judging whether it is a human face,this is the two classification problem.In 2000,the AdaBoost classifier was used.When classifying,the input of the classifying,the input of the classifier is a Haar feature,which is a very simple feature that can see a lot of small black and white blocks on the graph.The Haar feature is the sum of all pixel values of the black region minus the sum of all the pixels in the white area,with this difference as a feature,black block and white.Blocks have different sizes and relative positions,which form many different Haar features.AdaBoost classifier isa strong classifier which is composed of multiple weak classifiers.The Viola-Jones detector is composed of multiple AdaBoost classifiers.This cascade is an important function of acceleration.
When face detection technology is mature in 2000,some practital applications have appeared,such as the function of face focusing in digital cameras.When the camera is photographed,the camera will automatically detect the face,and the adjust the focal length to better according to the position of the face.

Deformable component model
After Viola-Jones face detector,another important method appeared in 2009:Deformable part model(DPM),that is,deformable part model.As far as face detection is concerned,a face can be roughly regarded as a rigid body,usually without very large deformation,such as the position of the mouth to the nose.But for other objects,such as the human body,people can lift their arms up and turn their legs up,which makes the body have a very large,very large non rigid transformation,and DPM can better deal with the transformation by modeling the components.At the beginning,people tried to try to test pedestrians with the Haar feature +AdaBoost classifier ,but find the results were not good.By 2009,there was a DPM to model different parts,such as the head with arms and knees,and then classifying the parts based on the local components and the whole.That‘s a lot better.DPM is relatively complex and the detection speed is relatively slow,but it has achieved some results in the task of face detection and pedestrian and vehicle detection.Later,some methods of speeding up DPM were introduced to improve the detection speed.DPM gets the modeling of components,which is a good method,but is covered by the light of the deep learning.Deep learning brought a great improvement in the accuracy of detection.so some people studying DPM are also rapidly moving to deep learning.

R-CNN series
For object detection methods based on deep learning,one of those is R-CNN series,another is the combination of traditional methods and deep learning methods.

The so-called R-CNN,is based on a very simple thinking,for the input image,through selective search and other methods,we first identify 2000 windows that are most likely to contain objects,and for these 2000 windows,we want it to be able to contain objects,and for these 2000 windows,we want it to be able to achieve a very high recall of the detected object.Then each of these 2000 is used to extract and classify the features from CNN.For these 2000 areas to run a CNN,then it is very slow,and even if it takes only 0.5 seconds each time,20000 windows will take 1000 seconds.In order to SPP-net,which is to run a CNN on the whole map,without doing each window alone,but there is a small difficulty is that the size of each of the 2000 candidate windows is different.In order to solve to solve this problem,SPP-net designs the spatial pyramid pooling,which makes the different large the large windows characteristic of the same dimension.This method makes it unnecessary to compute the convolution of each candidate window,but it is still not fast enough.It still takes several seconds to detect second to detect an image.
技术分享图片

Fast R-CNN draws on the practice of SPP-net, convolutions in the entire graph, and then uses ROI-pooling to get the fixed feature vectors, such as the size of the window, converted to 7x7 so large. Fast R-CNN also introduces an important strategy to classify the window and return the frame of the object to make the detection box more accurate. In front of us, we say that the candidate window will have a very high recall rate, but the possible location of the frame is not very accurate. For example, a person‘s body frame may be a lack of arm and leg, then the detection frame can be calibrated by regression, and the initial position is refined. Fast R-CNN put classification and regression together and adopted a multi task collaborative learning approach.
Faster R-CNN brings a bigger change than Fast R-CNN, and the step that will generate the candidate window is also done with the depth network, and the network is shared with the Fast R-CNN classification network. The network that produces the candidate window is called the RPN, which is the core of Faster R-CNN. RPN replaced the very slow Selective Search before, and usually the number of candidate windows used is relatively small, only 300 is enough, which makes the back classification faster. In order to detect a variety of objects, RPN introduces the design of the so-called anchor box. Specifically, on the feature graph of the last coiling layer output, RPN is first used to obtain the eigenvectors of each position with the convolution of 3X3, and then based on the eigenvector to return to 9 windows of different sizes and the ratio of length to width, if the size of the feature graph is size. It‘s 40x60, so there will be about more than 20 thousand windows in total, sorting these windows by credibility, and taking the first 300 as a candidate window to do the final classification. By replacing Selective Search with RPN and using a shared coiling layer and reducing the number of candidate windows at the same time, Faster R-CNN has been significantly improved in speed, and the speed of 5fps can be reached on GPU.

The Unknown Word

The First Column	The Second Column
execution	[eksi‘kjution]执行
resume	恢复
resume the script exection	恢复脚本执行
pedestrain	[pe‘destrien]行人
vehicle	车辆[vi:ekl]
deformable	[di‘fo:mebel]可变形的
graphics	图像[‘graefiks]
transactions	处理
graphics transactions	图像处理
spatial	空间的[‘speishel]
pyramid	金字塔[‘piremid]

以上是关于Progress and Prospect of target detection technology based on deep learning的主要内容，如果未能解决你的问题，请参考以下文章

hadoop错误org.apache.hadoop.mapred.TaskAttemptListenerImpl Progress of TaskAttempt

《Neural Machine Translation: Challenges, Progress and Future》译文分享

叉车机器人托盘定位技术：近期进展回顾

Codeforces 933 C. A Colourful Prospect （平面图，欧拉公式）

R语言使用econocharts包创建微观经济或宏观经济图ptvalue函数可视化前景理论价值函数（Prospect theory value function）

NLP-Progress记录NLP最新数据集论文和代码: 助你紧跟NLP前沿

Progress and Prospect of target detection technology based on deep learning

Viola-Jones face detector

Deformable component model

R-CNN series

The Unknown Word