人脸验证(图片/视频) tensorflowpytorch框架dlib库(face_recognition)和opencv库————附带详细步骤和代码，可实际运行

Posted 2023-04-05 繁星蓝雨

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了人脸验证(图片/视频) tensorflowpytorch框架dlib库(face_recognition)和opencv库————附带详细步骤和代码，可实际运行相关的知识，希望对你有一定的参考价值。

文章目录

0 背景与结果
1 准备知识
2 tensorflow进行人脸识别（AlexNet、视频/图像）
3 pytorch进行人脸识别（fasterrcnn，图片）
4 dlib库(face_recognition)进行人脸识别
5 百度人脸搜索
6 项目源代码
参考文章

0 背景与结果

在上篇文章中，我们装了人脸识别的环境，这里我们使用安装的框架和库进行实际项目练习。

这篇文章的前两个模型都是摘录和总结自网上，文末附带的所有模型代码均可以运行。由于专业并不是该方向，但是项目中需要使用到该功能，只经过几天简短的学习，撰写了此篇文章。文中如有不对的地方，还望大佬指出。模型可以进行图片和视频的人脸验证。

下面附带识别结果：

1 准备知识

做人脸识别有四部，人脸检测（face detection）、人脸校对（face alignment）、人脸表征（face verification）、人脸鉴别（face identification）。

人脸检测 ：提取图像中的人脸区域；
人脸对齐：根据人脸的特征点将倾斜或侧脸对齐；
人脸表征：将人脸图像的像素值转换成紧凑且可判别的特征向量或模版；
人脸匹配：对比两个人脸特征向量的相似度，进而判断是否是同一个人。

2 tensorflow进行人脸识别（AlexNet、视频/图像）

项目代码参考于此,源代码使用sklearn、keras等库。后经过此博文改写，本文又进行一些小修改，让程序可以运行。

模型特点：

人脸检测 ：使用opencv的人脸识别分类器haarcascade_frontalface_alt2.xml（基于Haar特征）来检测人脸；
人脸对齐：未进行人脸对齐，仅进行把人脸图像调整尺寸为正方形（防止缩放失真），然后把图像缩放成64 x 64作为后面训练的数据集；
人脸表征：使用CNN卷积神经网络（三个卷积层、一个全连接层、一个输出层，AlexNet【2012 ImageNet竞赛冠军】）来训练输入的数据集，并存储为已知人脸数据库；
人脸匹配：载入已知的人脸数据库，使用opencv人脸分类器识别出人脸后，再把图像像素调整缩放后，使用tensorflow的方法sess.run进行模型预测。

模型的关键代码：

人脸检测

# 人脸识别分类器地址
PATH_CLASSFIER_CV2_FRONTALFACE_ALT2 = "/Users/mac/PycharmProjects/tensorflowTest/src/haarcascade_frontalface_alt2.xml"

# 告诉OpenCV使用人脸识别分类器
classfier = cv2.CascadeClassifier(PATH_CLASSFIER_CV2_FRONTALFACE_ALT2)

# 人脸检测，scaleFactor和minNeighbors分别为图片缩放比例和需要检测的有效点数
face_rects = classfier.detectMultiScale(grey, scaleFactor=1.2, minNeighbors=3, minSize=(32, 32))

# 得到人脸区域四个点的坐标
t, b, r, l = face_rects[0]

CNN模型：

x_data = tf.placeholder(tf.float32, [None, SIZE, SIZE, 3])

y_data = tf.placeholder(tf.float32, [None, None])

keep_prob_5 = tf.placeholder(tf.float32)
keep_prob_75 = tf.placeholder(tf.float32)


def weightVariable(shape):
    '''定义Weight变量，输入shape，返回变量的参数。其中我们使用了tf.random_normal产生随机变量来进行初始化'''
    init = tf.random_normal(shape, stddev=0.01)
    #init = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(init)

def biasVariable(shape):
    ''' 定义biase变量，输入shape，返回变量的一些参数。'''
    init = tf.random_normal(shape)
    #init = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(init)
    
    
def conv2d(x, W):
    '''
    定义卷积操作。tf.nn.conv2d函数是Tensorflow里面的二维的卷积函数，x是图片的所有参数，W是卷积层的权重，然后定义步长strides=[1,1,1,1]值。strides[0]和strides[3]的两个1是默认值，意思是不对样本个数和channel进行卷积，中间两个1代表padding是在x方向运动一步，y方向运动一步，padding采用的方式实“SAME”就是0填充。
    '''
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def maxPool(x):
    '''定义池化操作。为了得到更多的图片信息，卷积时我们选择的是一次一步，也就是strides[1]=strides[2]=1,这样得到的图片尺寸没有变化，而我们希望压缩一下图片也就是参数能少一些从而减少系统的复杂度，因此我们采用pooling来稀疏化参数，也就是卷积神经网络中所谓的下采样层。'''
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

def dropout(x, keep):
    '''为了防止过拟合的问题，可以加一个dropout的处理。'''
    return tf.nn.dropout(x, keep)
    

def cnnLayer(classnum):
    '''创建卷积层'''
    # 第一层
    W1 = weightVariable([3, 3, 3, 32]) # 卷积核大小(3,3)， 输入通道(3)， 输出通道(32)
    b1 = biasVariable([32])
    conv1 = tf.nn.relu(conv2d(x_data, W1) + b1)
    pool1 = maxPool(conv1)
    # 减少过拟合，随机让某些权重不更新
    drop1 = dropout(pool1, keep_prob_5) # 32 * 32 * 32 多个输入channel 被filter内积掉了

    # 第二层
    W2 = weightVariable([3, 3, 32, 64])
    b2 = biasVariable([64])
    conv2 = tf.nn.relu(conv2d(drop1, W2) + b2)
    pool2 = maxPool(conv2)
    drop2 = dropout(pool2, keep_prob_5) # 64 * 16 * 16

    # 第三层
    W3 = weightVariable([3, 3, 64, 64])
    b3 = biasVariable([64])
    conv3 = tf.nn.relu(conv2d(drop2, W3) + b3)
    pool3 = maxPool(conv3)
    drop3 = dropout(pool3, keep_prob_5) # 64 * 8 * 8

    # 全连接层
    Wf = weightVariable([8*16*32, 512])
    bf = biasVariable([512])
    drop3_flat = tf.reshape(drop3, [-1, 8*16*32])
    dense = tf.nn.relu(tf.matmul(drop3_flat, Wf) + bf)
    dropf = dropout(dense, keep_prob_75)

    # 输出层
    Wout = weightVariable([512, classnum])
    bout = weightVariable([classnum])
    resMat = tf.matmul(dropf, Wout)

    # out = tf.add(tf.matmul(dropf, Wout), bout) # 原始数据输出
    # 输出层归一化
    # Sigmoid函数可以用来解决多标签问题，Softmax函数用来解决单标签问题
    # out = tf.add(tf.sigmoid(resMat), bout) # [array([[0.0795017 , 0.03605248, 0.9799969 ]]
    out = tf.add(tf.nn.softmax(resMat), bout) # [array([[0.00744988, 0.03517907, 0.979998  ]]

    print(f'tf.matmul(dropf, Wout):tf.sigmoid(tf.matmul(dropf, Wout))')
    return out

填充图片方便缩放图片不失真：

def resizeImage(image, height, width):
    '''
    按照指定图像大小调整尺寸
    判断图片是不是正方形，如果不是，则增加短边的长度使之变成正方形;
    这样再调用cv2.resize()函数就可以实现等比例缩放了;
    因为我们指定缩放的比例就是64 x 64，只有缩放之前图像为正方形才能确保图像不失真。
    '''
    # 相应方向上的边框宽度
    top, bottom, left, right = (0, 0, 0, 0)
    # 获取图像尺寸
    h, w, _ = image.shape
    # 对于长宽不相等的图片，找到最长的一边
    longest_edge = max(h, w)
    # 计算短边需要增加多上像素宽度使其与长边等长
    if h < longest_edge: # 上下扩充
        dh = longest_edge - h
        top = dh // 2 # 因为上下都需要补齐，所以除以2，向下取整
        bottom = dh - top
    elif w < longest_edge: # 左右扩充
        dw = longest_edge - w
        left = dw // 2
        right = dw - left
    else:
        pass

    print(top, bottom, left, right)
    # 把图像补全成长宽一样的图像
    # 给图像增加边界，是图片长、宽等长，cv2.BORDER_CONSTANT指定边界颜色由value指定
    constant = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=RESIZE_FILL_COLOR)
    # cv2.imshow('constant',constant),cv2.waitKey(0),cv2.destroyAllWindows()

    # 调整图像大小并返回
    return cv2.resize(constant, (height, width))

模型改进点：

1，使用opencv自带的人脸识别分类器识别人脸准确性；
2，模型使用的比较基础的CNN模型（ AlexNet），后期提升效果可以更换为其他CNN模型，如VGGNet、ResNet，如果应用于移动设备可以使用轻量级CNN模型，如MobileNet，SqueezeNet，ShuffleNet等。

3 pytorch进行人脸识别（fasterrcnn，图片）

源代码参考自此博客，博主文章写了一系列机器学习的文章，内容很详细，非常建议详细学习一下。

模型特点：

人脸检测 ：使用训练过的fasterrcnn模型进行的人脸识别；
人脸对齐：未进行人脸对齐；
人脸表征：根据已知的人脸图片生成人脸编码库；
人脸匹配：载入已知的人脸数据库，使用fasterrcnn模型识别出人脸后，使用人脸认证模型来对计算出来的未知人脸和已知人脸库中人脸编码的欧几里德距离 (Euclidean Distance)或者余弦相似度 (Cosine Similarity)来判断是否是同一个人。

这里没有直接使用编码距离是否小于某个阈值来判断是否是同一个人，因为这个阈值一般很难定义。

人脸认证模型:模型只有一层线性模型，它会给编码中的每个指标乘以一个系数，然后加上偏移值，再交给 Sigmoid 转换到 0 ～ 1 之间的值，0 代表不是同一个人，1 代表是同一个人。

关键代码：

人脸认证模型：

class FaceRecognitionModel(nn.Module):
    """人脸识别模型，计算用于寻找最接近人脸的编码 (基于 ResNet 的变种)"""
    # 编码长度
    EmbeddedSize = 32
    # 要求不同人物编码之间的距离 (平方值合计)
    ExclusiveMargin = 0.2

    def __init__(self):
        super().__init__()
        # Resnet 的实现
        self.resnet = torchvision.models.resnet18(num_classes=256)
        # 支持黑白图片
        if USE_GRAYSCALE:
            self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        # 最终输出编码的线性模型
        # 因为 torchvision 的 resnet 最终会使用一个 Linear，这里省略掉第一个 Linear
        self.encode_model = nn.Sequential(
            nn.ReLU(inplace=True),
            nn.Linear(256, 128),
            nn.ReLU(inplace=True),
            nn.Linear(128, FaceRecognitionModel.EmbeddedSize))

    def forward(self, x):
        tmp = self.resnet(x)
        y = self.encode_model(tmp)
        return y

    @staticmethod
    def loss_function(predicted):
        """损失计算器"""
        losses = []
        verify_positive = torch.ones(1).to(device)
        verify_negative = torch.zeros(NEGATIVE_SAMPLES).to(device)
        for index in range(0, predicted.shape[0], 2 + NEGATIVE_SAMPLES):
            a = predicted[index]   # 基础人物的编码
            b = predicted[index+1] # 基础人物的编码 (另一张图片)
            c = predicted[index+2:index+2+NEGATIVE_SAMPLES] # 对比人物的编码
            # 计算编码相差值
            diff_positive = (a - b).pow(2).sum()
            diff_negative = (a - c).pow(2).sum(dim=1)
            # 计算损失
            # 使用 Triplet Loss，要求同一人物编码距离和不同人物编码距离至少相差 ExclusiveMargin
            loss = nn.functional.relu(
                diff_positive - diff_negative + FaceRecognitionModel.ExclusiveMargin).sum()
            losses.append(loss)
        loss_total = torch.stack(losses).mean()
        return loss_total

    @staticmethod
    def calc_accuracy(predicted):
        """正确率计算器"""
        total_count = 0
        correct_count = 0
        for index in range(0, predicted.shape[0], 2 + NEGATIVE_SAMPLES):
            a = predicted[index]   # 基础人物的编码
            b = predicted[index+1] # 基础人物的编码 (另一张图片)
            c = predicted[index+2:index+2+NEGATIVE_SAMPLES] # 对比人物的编码
            # 判断同一人物的编码是否小于不同人物的编码
            diff_positive = (a - b).pow(2).sum()
            diff_negative = (a - c).pow(2).sum(dim=1)
            if (diff_positive < diff_negative).sum() == diff_negative.shape[0]:
                correct_count += 1
            total_count += 1
        return correct_count / total_count

class FaceVerificationModel(nn.Module):
    """人脸认证模型，判断是否同一个人，参数是编码相差值的平方"""
    # 判断是否同一个人的阈值，实际使用模型时可以用更高的值防止误判
    VerifyThreshold = 0.5

    def __init__(self):
        super().__init__()
        # 判断是否同一人物的线性模型
        self.verify_model = nn.Sequential(
            nn.Linear(FaceRecognitionModel.EmbeddedSize, 1),
            nn.Sigmoid())

    def forward(self, x):
        # 经过训练后 weight 应该是负数，bias 应该是正数
        y = self.verify_model(x)
        return y.view(-1)

    @staticmethod
    def loss_function(predicted):
        """损失计算器"""
        # 输出应该为 [ 同一人物, 不同人物, 不同人物, ..., 同一人物, 不同人物, 不同人物, ... ]
        # 这里需要分别计算正负损失，否则会因为负样本占多而引起 bias 被调整为负数
        positive_indexes = []
        negative_indexes = []
        for index in list(range(0, predicted.shape[0], 1+NEGATIVE_SAMPLES)):
            positive_indexes.append(index)
            negative_indexes += list(range(index+1, index+1+NEGATIVE_SAMPLES))
        positive_loss = nn.functional.mse_loss(
            predicted[positive_indexes], torch.ones(len(positive_indexes)).to(device))
        negative_loss = nn.functional.mse_loss(
            predicted[negative_indexes], torch.zeros(len(negative_indexes)).to(device))
        return (positive_loss + negative_loss) / 2

    @staticmethod
    def calc_accuracy(predicted):
        """正确率计算器"""
        positive_correct = 0
        positive_total = 0
        negative_correct = 0
        negative_total = 0
        for index in range(0, predicted.shape[0], 1+NEGATIVE_SAMPLES):
            positive_correct += (predicted[index] >=
                                 FaceVerificationModel.VerifyThreshold).sum().item()
            negative_correct += (predicted[index+1:index+1+NEGATIVE_SAMPLES] <
                                 FaceVerificationModel.VerifyThreshold).sum().item()
            positive_total += 1
            negative_total += NEGATIVE_SAMPLES
        # 因为负样本占大多数，这里返回正样本正确率和负样本正确率的平均值
        return (positive_correct / positive_total + negative_correct / negative_total) / 2

fasterrcnn模型比较长，这里只放一下部分代码：

class MyModel(nn.Module):
    """Faster-RCNN (基于 ResNet 的变种)"""
    Anchors = None # 锚点列表，包含 锚点数量 * 形状数量 的范围
    AnchorSpan = 16 # 锚点之间的距离，应该等于原有长宽 / resnet 输出长宽
    AnchorScales = (1, 2, 4, 6, 8) # 锚点对应区域的缩放比例列表
    AnchorAspects = ((1, 1),) # 锚点对应区域的长宽比例列表
    AnchorBoxes = len(AnchorScales) * len(AnchorAspects) # 每个锚点对应的形状数量

    def __init__(self):
        super().__init__()
        # 抽取图片各个区域特征的 ResNet (除去 AvgPool 和全连接层)
        # 和 Fast-RCNN 例子不同的是输出的长宽会是原有的 1/16，后面会根据锚点与 affine_grid 截取区域
        # 此外，为了可以让模型跑在 4GB 显存上，这里减少了模型的通道数量
        # 注意:
        # RPN 使用的模型和标签分类使用的模型需要分开，否则会出现无法学习 (RPN 总是输出负) 的问题
        self.previous_channels_out = 4
        self.rpn_resnet = nn.Sequential(
            nn.Conv2d(3, self.previous_channels_out, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(self.previous_channels_out),
            nn.ReLU(inplace=True),
            self._make_layer(BasicBlock, channels_out=8, num_blocks=2, stride=1),
            self._make_layer(BasicBlock, channels_out=16, num_blocks=2, stride=2),
            self._make_layer(BasicBlock, channels_out=32, num_blocks=2, stride=2),
            self._make_layer(BasicBlock, channels_out=64, num_blocks=2, stride=2),
            self._make_layer(BasicBlock, channels_out=128, num_blocks=2, stride=2))
        self.previous_channels_out = 4
        self.cls_resnet = nn.Sequential(
            nn.Conv2d(3, self.previous_channels_out, ke
人脸识别0视频分解图片与图片合成视频

一，引言
目标：这小节主要通过两个demo熟悉视频分解图片与图片合成视频的OpenCV的应用
环境：python3.6+OpenCV3.3.1
二，示例
Demo1:视频分解图片
目标：
1.指定文件夹中读取视频文件
2.将视频文件分解为图片
3.将图片保存在指定文件夹中

# -*-coding:utf-8-*-
#author: lyp time: 2018/8/8
# 视频分解图片
import cv2
cap = cv2.VideoCapture(‘E:/Envs/opencvdemo/one/1.mp4‘)  # 获取一个视频
isOpened = cap.isOpened()  # 判断当前视频是否打开
print(isOpened)
fps = cap.get(cv2.CAP_PROP_FPS)  # 帧率
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))  # 获取宽度
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))  # 获取高度
print(fps, width, height)
i = 0
while(isOpened):
    if i == 10:
        break
    else:
        i = i+1
    (flag, frame) = cap.read()  # 读取每一张。 flag：是否读取成功 frame：读取内容
    fileName = ‘image‘ + str(i) + ‘.jpg‘
    file = ‘E:/Envs/opencvdemo/one/‘ + fileName   # 保存到指定文件夹
    print(fileName)
    # 如果读取成功，保存图片
    if flag == True:
        # 质量控制：100表示质量最高
        cv2.imwrite(file, frame, [cv2.IMWRITE_JPEG_QUALITY, 100])
print(‘end!‘)

Demo2:图片合成视频
目标：
1.选择指定文件夹下的图片，获取图片信息
2.将图片合成视频（Windows中只能用DIVX）
3.将视频保存在指定文件夹中
 

# -*-coding:utf-8-*-
#author: lyp time: 2018/8/9
# 图片合成视频
import cv2
img = cv2.imread(‘E:/Envs/opencvdemo/one/image1.jpg‘)
imgInfo = img.shape
size = (imgInfo[1], imgInfo[0])
print(size, imgInfo)

# windows下使用DIVX
fourcc = cv2.VideoWriter_fourcc(*‘DIVX‘)
# 参数1：写入对象；参数2：编码器；参数3：视频size
videoWrite = cv2.VideoWriter(‘E:/Envs/opencvdemo/one/2.avi‘, fourcc, 5, size)
for i in range(1,11):
    fileName = ‘image‘+str(i)+‘.jpg‘
    file = ‘E:/Envs/opencvdemo/one/‘ + fileName
    img = cv2.imread(file)
    videoWrite.write(img)
print(‘end!‘)

 
以上是关于人脸验证(图片/视频) tensorflowpytorch框架dlib库(face_recognition)和opencv库————附带详细步骤和代码，可实际运行的主要内容，如果未能解决你的问题，请参考以下文章