如何用opencv实现人脸检测与跟踪
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何用opencv实现人脸检测与跟踪相关的知识,希望对你有一定的参考价值。
参考技术A 很早以前在processing官网中找到了Face Detect(这个链接需要代理才能打开)这个为processing提供人脸识别功能的lib,今天终于静下心来仔细阅读了它的说明文档,下面是其基本使用方法说明: 首先下载pFaceDetect.zip,在processing的libraries文件夹中新建一个名为pFaceDetect的文件夹,再将压缩包内的library文件夹拷贝到这个文件夹中。 其次,Face Detect需要openCV的支持,但并不必须安装openCV到计算机中,只要把openCV的cxcore100.dll 、cv100.dll 、libguide40.dll 三个文件拷贝到processing根目录就行了。 pFaceDetect是对PImage对象进行识别,下面的范例中是通过JMyron来获取视频,并将像素信息复制到PImage对象中,因此,运行下面的范例还需要安装JMyron。 重启processing,新建一个项目并保存,把压缩包内的data文件夹拷贝到当前项目的存档目录中。 不要忘记连接好摄像头。本回答被提问者采纳Opencv 人脸检测与跟踪
Opencv 的人脸检测有一个Haar-like特点的cascade-classifier,可以定义不同侦测对象的XML初始化,我们将使用这个文件检测到一个正面人脸和一个侧面人脸,这些文件是被机器学习算法在成百上千的含或不含人脸的图片训练得到的,这样的学习算法可以将人脸的全部特征都提取出来。并存放在XML文件中。其中的一些XML文件已经从Opencv的源代码拷贝至/rbx1_vision/data/haae_detectors目录。
import rospy | |
import cv2 | |
import cv2.cv as cv | |
from rbx1_vision.ros2opencv2 import ROS2OpenCV2 | |
class FaceDetector(ROS2OpenCV2): | |
def __init__(self, node_name): | |
super(FaceDetector, self).__init__(node_name) | |
# Get the paths to the cascade XML files for the Haar detectors. | |
# These are set in the launch file. | |
cascade_1 = rospy.get_param("~cascade_1", "") | |
cascade_2 = rospy.get_param("~cascade_2", "") | |
cascade_3 = rospy.get_param("~cascade_3", "") | |
# Initialize the Haar detectors using the cascade files | |
self.cascade_1 = cv2.CascadeClassifier(cascade_1) | |
self.cascade_2 = cv2.CascadeClassifier(cascade_2) | |
self.cascade_3 = cv2.CascadeClassifier(cascade_3) | |
# Set cascade parameters that tend to work well for faces. | |
# Can be overridden in launch file | |
self.haar_scaleFactor = rospy.get_param("~haar_scaleFactor", 1.3) | |
self.haar_minNeighbors = rospy.get_param("~haar_minNeighbors", 3) | |
self.haar_minSize = rospy.get_param("~haar_minSize", 30) | |
self.haar_maxSize = rospy.get_param("~haar_maxSize", 150) | |
# Store all parameters together for passing to the detector | |
self.haar_params = dict(scaleFactor = self.haar_scaleFactor, | |
minNeighbors = self.haar_minNeighbors, | |
flags = cv.CV_HAAR_DO_CANNY_PRUNING, | |
minSize = (self.haar_minSize, self.haar_minSize), | |
maxSize = (self.haar_maxSize, self.haar_maxSize) | |
) | |
# Do we should text on the display? | |
self.show_text = rospy.get_param("~show_text", True) | |
# Intialize the detection box | |
self.detect_box = None | |
# Track the number of hits and misses | |
self.hits = 0 | |
self.misses = 0 | |
self.hit_rate = 0 | |
def process_image(self, cv_image): | |
try: | |
# Create a greyscale version of the image | |
grey = cv2.cvtColor(cv_image, cv2.COLOR_BGR2GRAY) | |
# Equalize the histogram to reduce lighting effects | |
grey = cv2.equalizeHist(grey) | |
# Attempt to detect a face | |
self.detect_box = self.detect_face(grey) | |
# Did we find one? | |
if self.detect_box is not None: | |
self.hits += 1 | |
else: | |
self.misses += 1 | |
# Keep tabs on the hit rate so far | |
self.hit_rate = float(self.hits) / (self.hits + self.misses) | |
except: | |
pass | |
return cv_image | |
def detect_face(self, input_image): | |
# First check one of the frontal templates | |
if self.cascade_1: | |
faces = self.cascade_1.detectMultiScale(input_image, **self.haar_params) | |
# If that fails, check the profile template | |
if len(faces) == 0 and self.cascade_3: | |
faces = self.cascade_3.detectMultiScale(input_image, **self.haar_params) | |
# If that also fails, check a the other frontal template | |
if len(faces) == 0 and self.cascade_2: | |
faces = self.cascade_2.detectMultiScale(input_image, **self.haar_params) | |
# The faces variable holds a list of face boxes. | |
# If one or more faces are detected, return the first one. | |
if len(faces) > 0: | |
face_box = faces[0] | |
else: | |
# If no faces were detected, print the "LOST FACE" message on the screen | |
if self.show_text: | |
font_face = cv2.FONT_HERSHEY_SIMPLEX | |
font_scale = 0.5 | |
cv2.putText(self.marker_image, "LOST FACE!", | |
(int(self.frame_size[0] * 0.65), int(self.frame_size[1] * 0.9)), | |
font_face, font_scale, cv.RGB(255, 50, 50)) | |
face_box = None | |
# Display the hit rate so far | |
if self.show_text: | |
font_face = cv2.FONT_HERSHEY_SIMPLEX | |
font_scale = 0.5 | |
cv2.putText(self.marker_image, "Hit Rate: " + | |
str(trunc(self.hit_rate, 2)), | |
(20, int(self.frame_size[1] * 0.9)), | |
font_face, font_scale, cv.RGB(255, 255, 0)) | |
return face_box | |
def trunc(f, n): | |
'''Truncates/pads a float f to n decimal places without rounding''' | |
slen = len('%.*f' % (n, f)) | |
return float(str(f)[:slen]) | |
if __name__ == '__main__': | |
try: | |
node_name = "face_detector" | |
FaceDetector(node_name) | |
rospy.spin() | |
except KeyboardInterrupt: | |
print "Shutting down face detector node." | |
cv2.destroyAllWindows() |
解析以上代码,首先调用了rospy的库,OpenCV2的库,导入我们之前已经创建的ros2opencv2类,人脸侦测节点必须被定义为一个扩展ROSOpenCV2类的类,以这种方式,它从ros2opencv2继承所有基本处理函数以及变量,如用户使用鼠标选择,在ROI周围出现显示框,每当我们扩展一个类,我们必须初始化扩展的父类,这一点可以用PYTHOIN的SUPER去做,如上所示:
其次定义三个参数用来存储HAAR级联检测器的XML文件路径名,这些路径名由launch文件定义,这些文件是由Opencv源代码复制过来的,三个文件中,一个是侧脸,另外两个是正脸。
定义变量来确定级联分类器的速度和正确检测目标的概率参数,特别是minsize和maxsize参数,设置了能够接受最大最小值,scaleFactor是改变识别图像尺寸的一个参数,这个数越小,人脸越精细。
为了引用方便,将他们全部放在一个python字典里面,得到show-text参数,定义跟踪的命中数以及丢失数,将图像转化为灰度图像,均衡化图像转化为直方图减少整体亮度变化,发送预处理的图像到detect_face()函数,如果一个脸被检测到,那么边界框将返回给变量。self.detect_box,后者又由ROS2Opencv2基类绘制到图像上,启动脚本的核心部分detect_face()函数,我们通过使用第一个XML模板cascade detector运行输入的图像,该detectMultiscale()函数搜索在多个尺度的物体,并返回侦测到的人脸以矩形框格式保持在列表中,如果第一个没有检测到就换第二个,有必要的话还可以尝试第三个,如果只找到一个或多个脸,那么faces变量将保存一个内容为人脸框的列表,我们将只追踪发现的第一个人脸,所以设置FAXCE_BOX=face[0],无论如何,其结果都会返回调用它的函数process_imnage()
2、用goodfeaturestotrack进行关键点检测,goodfeaturestotrack。cornerharris和surf我们使用goodfeaturestotrack()返回关键点,代码如下
import rospy | |
import cv2 | |
import cv2.cv as cv | |
from rbx1_vision.ros2opencv2 import ROS2OpenCV2 | |
import numpy as np | |
class GoodFeatures(ROS2OpenCV2): | |
def __init__(self, node_name): | |
super(GoodFeatures, self).__init__(node_name) | |
# Do we show text on the display? | |
self.show_text = rospy.get_param("~show_text", True) | |
# How big should the feature points be (in pixels)? | |
self.feature_size = rospy.get_param("~feature_size", 1) | |
# Good features parameters | |
self.gf_maxCorners = rospy.get_param("~gf_maxCorners", 200) | |
self.gf_qualityLevel = rospy.get_param("~gf_qualityLevel", 0.02) | |
self.gf_minDistance = rospy.get_param("~gf_minDistance", 7) | |
self.gf_blockSize = rospy.get_param("~gf_blockSize", 10) | |
self.gf_useHarrisDetector = rospy.get_param("~gf_useHarrisDetector", True) | |
self.gf_k = rospy.get_param("~gf_k", 0.04) | |
# Store all parameters together for passing to the detector | |
self.gf_params = dict(maxCorners = self.gf_maxCorners, | |
qualityLevel = self.gf_qualityLevel, | |
minDistance = self.gf_minDistance, | |
blockSize = self.gf_blockSize, | |
useHarrisDetector = self.gf_useHarrisDetector, | |
k = self.gf_k) | |
# Initialize key variables | |
self.keypoints = list() | |
self.detect_box = None | |
self.mask = None | |
def process_image(self, cv_image): | |
try: | |
# If the user has not selected a region, just return the image | |
if not self.detect_box: | |
return cv_image | |
# Create a greyscale version of the image | |
grey = cv2.cvtColor(cv_image, cv2.COLOR_BGR2GRAY) | |
# Equalize the histogram to reduce lighting effects | |
grey = cv2.equalizeHist(grey) | |
# Get the good feature keypoints in the selected region | |
keypoints = self.get_keypoints(grey, self.detect_box) | |
# If we have points, display them | |
if keypoints is not None and len(keypoints) > 0: | |
for x, y in keypoints: | |
cv2.circle(self.marker_image, (x, y), self.feature_size, (0, 255, 0, 0), cv.CV_FILLED, 8, 0) | |
# Process any special keyboard commands | |
if self.keystroke != -1: | |
try: | |
cc = chr(self.keystroke & 255).lower() | |
if cc == 'c': | |
# Clear the current keypoints | |
keypoints = list() | |
self.detect_box = None | |
except: | |
pass | |
except: | |
pass | |
return cv_image | |
def get_keypoints(self, input_image, detect_box): | |
# Initialize the mask with all black pixels | |
self.mask = np.zeros_like(input_image) | |
# Get the coordinates and dimensions of the detect_box | |
try: | |
x, y, w, h = detect_box | |
except: | |
return None | |
# Set the selected rectangle within the mask to white | |
self.mask[y:y+h, x:x+w] = 255 | |
# Compute the good feature keypoints within the selected region | |
keypoints = list() | |
kp = cv2.goodFeaturesToTrack(input_image, mask = self.mask, **self.gf_params) | |
if kp is not None and len(kp) > 0: | |
for x, y in np.float32(kp).reshape(-1, 2): | |
keypoints.append((x, y)) | |
return keypoints | |
if __name__ == '__main__': | |
try: | |
node_name = "good_features" | |
GoodFeatures(node_name) | |
rospy.spin() | |
except KeyboardInterrupt: | |
print "Shutting down the Good Features node." | |
cv.DestroyAllWindows() |
结构与上述大致相同,初始化goodfeatures类,由ROS2opencv2类扩展而来的,然后定义了一个process_image()函数来完成大部分工作,初始化函数中参数,Initialize key variables ,
Create a greyscale version of the image产生灰度图,
Equalize the histogram to reduce lighting effects 均衡化直方图,
Get the good feature keypoints in the selected region 获得特征点,
If we have points, display them 显示出来
Process any special keyboard commands 处理特殊键盘命令
get_keypoints为关键点侦测器,现有全零(黑色)来覆盖所有的图像,然后得到detect_box的尺寸和坐标,然后用白色盖住所选的图像,从cv2.goodFeaturesToTrack函数中,得到关键点的坐标向量,然后转化为(x.y)列表,所得的结构返回给PROCESS_IMAGE()函数。其次绘制在图像上,
cv2.goodFeaturesToTrack函数的代码如下所示:
void cv::goodFeaturesToTrack( InputArray _image, OutputArray _corners,
int maxCorners, double qualityLevel, double minDistance,
InputArray _mask, int blockSize,
bool useHarrisDetector, double harrisK )
{
//如果需要对_image全图操作,则给_mask传入cv::Mat(),否则传入感兴趣区域
Mat image = _image.getMat(), mask = _mask.getMat();
CV_Assert( qualityLevel > 0 && minDistance >= 0 && maxCorners >= 0 ); //对参数有一些基本要求
CV_Assert( mask.empty() || (mask.type() == CV_8UC1 && mask.size() == image.size()) );
Mat eig, tmp; //eig存储每个像素协方差矩阵的最小特征值,tmp用来保存经膨胀后的eig
if( useHarrisDetector )
cornerHarris( image, eig, blockSize, 3, harrisK ); //blockSize是计算2*2协方差矩阵的窗口大小,sobel算子窗口为3,harrisK是计算Harris角点时需要的值
else
cornerMinEigenVal( image, eig, blockSize, 3 ); //计算每个像素对应的协方差矩阵的最小特征值,保存在eig中
double maxVal = 0;
minMaxLoc( eig, 0, &maxVal, 0, 0, mask ); //maxVal保存了eig的最大值
threshold( eig, eig, maxVal*qualityLevel, 0, THRESH_TOZERO ); //阈值设置为maxVal乘以qualityLevel,大于此阈值的保持不变,小于此阈值的都设为0
//默认用3*3的核膨胀,膨胀之后,除了局部最大值点和原来相同,其它非局部最大值点被
//3*3邻域内的最大值点取代,如不理解,可看一下灰度图像的膨胀原理
dilate( eig, tmp, Mat()); //tmp中保存了膨胀之后的eig
Size imgsize = image.size();
// collect list of pointers to features - put them into temporary image
for( int y = 1; y < imgsize.height - 1; y++ )
{
const uchar* mask_data = mask.data ? mask.ptr(y) : 0;
for( int x = 1; x < imgsize.width - 1; x++ )
{
float val = eig_data[x];
if( val != 0 && val == tmp_data[x] && (!mask_data || mask_data[x]) ) //val == tmp_data[x]说明这是局部极大值
tmpCorners.push_back(eig_data + x); //保存其位置
}
}
//-----------此分割线以上是根据特征值粗选出的角点,我们称之为弱角点----------//
//-----------此分割线以下还要根据minDistance进一步筛选角点,仍然能存活下来的我们称之为强角点----------//
sort( tmpCorners, greaterThanPtr<float>() ); //按特征值降序排列,注意这一步很重要,后面的很多编程思路都是建立在这个降序排列的基础上
vector<Point2f> corners;
size_t i, j, total = tmpCorners.size(), ncorners = 0;
//下面的程序有点稍微难理解,需要自己仔细想想
if(minDistance >= 1)
{
// Partition the image into larger grids
int w = image.cols;
int h = image.rows;
const int cell_size = cvRound(minDistance); //向最近的整数取整
//这里根据cell_size构建了一个矩形窗口grid(虽然下面的grid定义的是vector<vector>,而并不是我们这里说的矩形窗口,但为了便于理解,还是将grid想象成一个grid_width * grid_height的矩形窗口比较好),除以cell_size说明grid窗口里相差一个像素相当于_image里相差minDistance个像素,至于为什么加上cell_size - 1后面会讲
const int grid_width = (w + cell_size - 1) / cell_size;
const int grid_height = (h + cell_size - 1) / cell_size;
std::vector<std::vector<Point2f> > grid(grid_width*grid_height); //vector里面是vector,grid用来保存获得的强角点坐标
minDistance *= minDistance; //平方,方面后面计算,省的开根号
for( i = 0; i < total; i++ ) // 刚刚粗选的弱角点,都要到这里来接收新一轮的考验
{
int y = (int)(ofs / eig.step); //角点在原图像中的行
int x = (int)((ofs - y*eig.step)/sizeof(float)); //在原图像中的列
bool good = true; //先认为当前角点能接收考验,即能被保留下来
int x_cell = x / cell_size; //x_cell,y_cell是角点(y,x)在grid中的对应坐标
int y_cell = y / cell_size;
int x1 = x_cell - 1; // (y_cell,x_cell)的4邻域像素
int y1 = y_cell - 1; //现在知道为什么前面grid_width定义时要加上cell_size - 1了吧,这是为了使得(y,x)在grid中的4邻域像素都存在,也就是说(y_cell,x_cell)不会成为边界像素
int x2 = x_cell + 1;
int y2 = y_cell + 1;
// boundary check,再次确认x1,y1,x2或y2不会超出grid边界
x1 = std::max(0, x1); //比较0和x1的大小
y1 = std::max(0, y1);
x2 = std::min(grid_width-1, x2);
y2 = std::min(grid_height-1, y2);
//记住grid中相差一个像素,相当于_image中相差了minDistance个像素
for( int yy = y1; yy <= y2; yy++ ) // 行
{
for( int xx = x1; xx <= x2; xx++ ) //列
{
vector <Point2f> &m = grid[yy*grid_width + xx]; //引用
if( m.size() ) //如果(y_cell,x_cell)的4邻域像素,也就是(y,x)的minDistance邻域像素中已有被保留的强角点
{
for(j = 0; j < m.size(); j++) //当前角点周围的强角点都拉出来跟当前角点比一比
{
float dx = x - m[j].x;
float dy = y - m[j].y;
//注意如果(y,x)的minDistance邻域像素中已有被保留的强角点,则说明该强角点是在(y,x)之前就被测试过的,又因为tmpCorners中已按照特征值降序排列(特征值越大说明角点越好),这说明先测试的一定是更好的角点,也就是已保存的强角点一定好于当前角点,所以这里只要比较距离,如果距离满足条件,可以立马扔掉当前测试的角点
光流法,从当前图像帧和已经提取的关键点开始,每个关键点都有一个位置和周边图像的像素的领域,在下一帧,使用最小二乘法来求解一个恒速的变化,这个变换将上一图像选定领域映射到下一图像中,如果最小二乘法的值不小于阀值就说明它与第一个帧的领域相同,分配到相同关键点到该位置,否则丢弃,然后当对象在镜头前移动的时候,只提前第一帧,并跟踪这个帧内的特征点。当跟踪误差帧之间太高了,关键点会被丢弃,附近的关键点会取代原有的关键点,因为预判到了误差。
import rospy | |
import cv2 | |
import cv2.cv as cv | |
import numpy as np | |
from rbx1_vision.good_features import GoodFeatures | |
class LKTracker(GoodFeatures): | |
def __init__(self, node_name): | |
super(LKTracker, self).__init__(node_name) | |
self.show_text = rospy.get_param("~show_text", True) | |
self.feature_size = rospy.get_param("~feature_size", 1) | |
# LK parameters | |
self.lk_winSize = rospy.get_param("~lk_winSize", (10, 10)) | |
self.lk_maxLevel = rospy.get_param("~lk_maxLevel", 2) | |
self.lk_criteria = rospy.get_param("~lk_criteria", (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 20, 0.01)) | |
self.lk_params = dict( winSize = self.lk_winSize, | |
maxLevel = self.lk_maxLevel, | |
criteria = self.lk_criteria) | |
self.detect_interval = 1 | |
self.keypoints = None | |
self.detect_box = None | |
self.track_box = None | |
self.mask = None | |
self.grey = None | |
self.prev_grey = None | |
def process_image(self, cv_image): | |
try: | |
# If we don't yet have a detection box (drawn by the user | |
# with the mouse), keep waiting | |
if self.detect_box is None: | |
return cv_image | |
# Create a greyscale version of the image | |
self.grey = cv2.cvtColor(cv_image, cv2.COLOR_BGR2GRAY) | |
# Equalize the grey histogram to minimize lighting effects | |
self.grey = cv2.equalizeHist(self.grey) | |
# If we haven't yet started tracking, set the track box to the | |
# detect box and extract the keypoints within it | |
if self.track_box is None or not self.is_rect_nonzero(self.track_box): | |
self.track_box = self.detect_box | |
self.keypoints = self.get_keypoints(self.grey, self.track_box) | |
else: | |
if self.prev_grey is None: | |
self.prev_grey = self.grey | |
# Now that have keypoints, track them to the next frame | |
# using optical flow | |
self.track_box = self.track_keypoints(self.grey, self.prev_grey) | |
# Process any special keyboard commands for this module | |
if self.keystroke != -1: | |
try: | |
cc = chr(self.keystroke & 255).lower() | |
if cc == 'c': | |
# Clear the current keypoints | |
self.keypoints = None | |
self.track_box = None | |
self.detect_box = None | |
except: | |
pass | |
self.prev_grey = self.grey | |
except: | |
pass | |
return cv_image | |
def track_keypoints(self, grey, prev_grey): | |
# We are tracking points between the previous frame and the | |
# current frame | |
img0, img1 = prev_grey, grey | |
# Reshape the current keypoints into a numpy array required | |
# by calcOpticalFlowPyrLK() | |
p0 = np.float32([p for p in self.keypoints]).reshape(-1, 1, 2) | |
# Calculate the optical flow from the previous frame to the current frame | |
p1, st, err = cv2.calcOpticalFlowPyrLK(img0, img1, p0, None, **self.lk_params) | |
# Do the reverse calculation: from the current frame to the previous frame | |
try: | |
p0r, st, err = cv2.calcOpticalFlowPyrLK(img1, img0, p1, None, **self.lk_params) | |
# Compute the distance between corresponding points in the two flows | |
d = abs(p0-p0r).reshape(-1, 2).max(-1) | |
# If the distance between pairs of points is < 1 pixel, set | |
# a value in the "good" array to True, otherwise False | |
good = d < 1 | |
# Initialize a list to hold new keypoints | |
new_keypoints = list() | |
# Cycle through all current and new keypoints and only keep | |
# those that satisfy the "good" condition above | |
for (x, y), good_flag in zip(p1.reshape(-1, 2), good): | |
if not good_flag: | |
continue | |
new_keypoints.append((x, y)) | |
# Draw the keypoint on the image | |
cv2.circle(self.marker_image, (x, y), self.feature_size, (0, 255, 0, 0), cv.CV_FILLED, 8, 0) | |
# Set the global keypoint list to the new list | |
self.keypoints = new_keypoints | |
# Convert the keypoints list to a numpy array | |
keypoints_array = np.float32([p for p in self.keypoints]).reshape(-1, 1, 2) | |
# If we have enough points, find the best fit ellipse around them | |
if len(self.keypoints) > 6: | |
track_box = cv2.fitEllipse(keypoints_array) | |
else: | |
# Otherwise, find the best fitting rectangle | |
track_box = cv2.boundingRect(keypoints_array) | |
except: | |
track_box = None | |
return track_box | |
if __name__ == '__main__': | |
try: | |
node_name = "lk_tracker" | |
LKTracker(node_name) | |
rospy.spin() | |
except KeyboardInterrupt: | |
print "Shutting down LK Tracking node." | |
cv.DestroyAllWindows() |
from rbx1_vision.good_features import GoodFeatures
class LKTracker(GoodFeatures):
def __init__(self, node_name):
super(LKTracker, self).__init__(node_name)
脚本的整体结构类似于face_detector.py节点,这一次我们引入good_features节点,并定义lktacker类作为GoodFeatures类的扩展,因为我们跟踪到的点正是关键点,然后定义参数,
If we haven't yet started tracking, set the track box to the detect box and extract the keypoints within it选择一个跟踪框来跟踪物体而且还在其中加入特征点,
track_keypoints:
We are tracking points between the previous frame and thecurrent frame
Reshape the current keypoints into a numpy array required by
calcOpticalFlowPyrLK()
通过当前关键点和两个灰度图像预测下一个关键点,用刚刚计算的未来的点去预测过去的点,并且预测之前的实际点与这些反向预测的点是否一致,保留其距离,发现是一个数组,若距离小于1,则为真,否则为假。
以上是关于如何用opencv实现人脸检测与跟踪的主要内容,如果未能解决你的问题,请参考以下文章