使用暹罗{（Xiānluó），泰国的旧称 one-shot} 网络进行人脸识别

Posted 2020-12-01 nndt075

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了使用暹罗{（Xiānluó），泰国的旧称 one-shot} 网络进行人脸识别相关的知识，希望对你有一定的参考价值。

使用暹罗{（Xiānluó），泰国的旧称 one-shot} 网络进行人脸识别

什么是暹罗网络？

暹罗网络是一种特殊类型的神经网络，是最简单、最常用的one-shot学习算法之一。

one-shot学习是一种每类只从一个训练例子中学习的技术。

暹罗网络主要用于在每个类中没有很多数据点的应用程序中。

为什么要使用暹罗网络？

例如，假设我们想为我们的组织建立一个人脸识别模型，大约有500人在我们的组织中工作。如果我们想用卷积神经网络(CNN)从零开始建立人脸识别模型，那么我们需要这500个人的很多图像来训练网络，以获得良好的准确度。但是很明显，我们不会为这500个人提供太多的图像，所以使用卷积神经网络(CNN)或任何深度学习算法来建立模型是不可行的，除非我们有足够的数据点。因此，在这种情况下，我们可以使用复杂的one-shot学习算法，比如暹罗网络，它可以从更少的数据点学习。

暹罗网络是如何运作的？

但是暹罗网络是如何运作的呢?暹罗网络基本上由两个对称的神经网络组成，它们具有相同的权值和结构，并在最后使用一些能量函数E连接在一起。暹罗网络的目标是了解两个输入值是相似还是不相似。比如我们有两幅图像，X1和X2，我们想知道这两幅图像是相似还是不相似的。

暹罗网络不仅用于人脸识别，而且还广泛用于我们没有很多数据点和任务的应用中，我们需要学习两个输入之间的相似性。暹罗网络的应用包括签名验证，类似问题检索，对象跟踪等。我们将在接下来的部分详细研究暹罗网络。

暹罗网络的体系结构

如图所示，暹罗网络由两个相同的网络组成，它们共享相同的权重和体系结构。假设我们有两个输入，X1和X2。我们将输入X1馈送到网络A，即fw（X1），我们将输入X2馈送到网络B，即fw（X2）。正如您将注意到的，这两个网络具有相同的权重w，它们将为我们的输入X1和X2生成嵌入。然后，我们将这些嵌入提供给能量函数E，这将使我们在两个输入之间具有相似性。

它可以表示如下：

暹罗网络的输入应该是成对的(X1, X2)，以及它们的二进制标签Y∈(0,1)，说明输入对是genuine pair（相同）还是 imposite pair（不同）。正如您在下表中看到的那样，我们将句子作为成对使用，Label表示句子对是genuine （1）还是imposite（0）：

暹罗网络使用相同的体系结构，通过查找两个输入值之间的相似性来学习。在涉及计算两个实体之间相似性的任务中，它是最常用的few-shot学习算法之一。它功能强大、健壮，可以作为低数据问题的解决方案。

使用Siamese网络进行人脸识别

我们将通过构建人脸识别模型来创建Siamese网络。我们网络的目标是了解两个人脸是相似还是不相似。我们使用面部的AT＆T数据库，可以从这里下载：https：//www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html。

下载并解压缩归档后，您可以看到文件夹s1，s2，直到s40，如下所示：

这些文件夹中的每一个都具有从不同角度拍摄的单个人的10个不同图像。例如，让我们打开文件夹s1。如您所见，一个人有10个不同的图像：

我们打开并检查文件夹s13：

我们将从同一个文件夹随机取两张图片，并将它们标记为genuine对，我们将从两个不同的文件夹取一张图片，并将它们标记为imposite对。

首先，我们将导入所需的Python库：

import re
import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split
from keras import backend as K
from keras.layers import Activation
from keras.layers import Input, Lambda, Dense, Dropout, Convolution2D, MaxPooling2D, Flatten
from keras.models import Sequential, Model
from keras.optimizers import RMSprop

现在，我们定义一个函数来读取输入图像。read_image函数以图像为输入，返回一个NumPy数组:

def read_image(filename, byteorder=‘>‘):
 
 #first we read the image, as a raw file to the buffer
 with open(filename, ‘rb‘) as f:
 buffer = f.read()
 
 #using regex, we extract the header, width, height and maxval of the image
 header, width, height, maxval = re.search(
 b"(^P5s(?:s*#.*[
])*"
 b"(d+)s(?:s*#.*[
])*"
 b"(d+)s(?:s*#.*[
])*"
 b"(d+)s(?:s*#.*[
]s)*)", buffer).groups()
 
 #then we convert the image to numpy array using np.frombuffer which interprets buffer as one dimensional array
 return np.frombuffer(buffer,
 dtype=‘u1‘ if int(maxval) < 256 else byteorder+‘u2‘,
 count=int(width)*int(height),
 offset=len(header)
 ).reshape((int(height), int(width)))

让我们打开一个图像：

Image.open("data/orl_faces/s1/1.pgm")

img = read_image(‘data/orl_faces/s1/1.pgm‘)
img.shape

(112, 92)

最后，我们将x_genuine_pair和x_imposite连接到X, y_genuine和y_imposite连接到Y:

size = 2
total_sample_size = 10000
def get_data(size, total_sample_size):
 #read the image
 image = read_image(‘data/orl_faces/s‘ + str(1) + ‘/‘ + str(1) + ‘.pgm‘, ‘rw+‘)
 #reduce the size
 image = image[::size, ::size]
 #get the new size
 dim1 = image.shape[0]
 dim2 = image.shape[1]
 count = 0
 
 #initialize the numpy array with the shape of [total_sample, no_of_pairs, dim1, dim2]
 x_geuine_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2]) # 2 is for pairs
 y_genuine = np.zeros([total_sample_size, 1])
 
 for i in range(40):
 for j in range(int(total_sample_size/40)):
 ind1 = 0
 ind2 = 0
 
 #read images from same directory (genuine pair)
 while ind1 == ind2:
 ind1 = np.random.randint(10)
 ind2 = np.random.randint(10)
 
 # read the two images
 img1 = read_image(‘data/orl_faces/s‘ + str(i+1) + ‘/‘ + str(ind1 + 1) + ‘.pgm‘, ‘rw+‘)
 img2 = read_image(‘data/orl_faces/s‘ + str(i+1) + ‘/‘ + str(ind2 + 1) + ‘.pgm‘, ‘rw+‘)
 
 #reduce the size
 img1 = img1[::size, ::size]
 img2 = img2[::size, ::size]
 
 #store the images to the initialized numpy array
 x_geuine_pair[count, 0, 0, :, :] = img1
 x_geuine_pair[count, 1, 0, :, :] = img2
 
 #as we are drawing images from the same directory we assign label as 1. (genuine pair)
 y_genuine[count] = 1
 count += 1
 count = 0
 x_imposite_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2])
 y_imposite = np.zeros([total_sample_size, 1])
 
 for i in range(int(total_sample_size/10)):
 for j in range(10):
 
 #read images from different directory (imposite pair)
 while True:
 ind1 = np.random.randint(40)
 ind2 = np.random.randint(40)
 if ind1 != ind2:
 break
 
 img1 = read_image(‘data/orl_faces/s‘ + str(ind1+1) + ‘/‘ + str(j + 1) + ‘.pgm‘, ‘rw+‘)
 img2 = read_image(‘data/orl_faces/s‘ + str(ind2+1) + ‘/‘ + str(j + 1) + ‘.pgm‘, ‘rw+‘)
 img1 = img1[::size, ::size]
 img2 = img2[::size, ::size]
 x_imposite_pair[count, 0, 0, :, :] = img1
 x_imposite_pair[count, 1, 0, :, :] = img2
 #as we are drawing images from the different directory we assign label as 0. (imposite pair)
 y_imposite[count] = 0
 count += 1
 
 #now, concatenate, genuine pairs and imposite pair to get the whole data
 X = np.concatenate([x_geuine_pair, x_imposite_pair], axis=0)/255
 Y = np.concatenate([y_genuine, y_imposite], axis=0)
 return X, Y

现在，我们生成数据并检查数据大小。如你所见，我们有20,000个数据点，其中10,000对是genuine pairs，10,000对是imposite pairs:

X, Y = get_data(size, total_sample_size)
X.shape
Y.shape

(20000, 2, 1, 56, 46)

(20000, 1)

接下来，我们将训练和测试的数据分成75%的训练和25%的测试比例:

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=.25)

现在我们已经成功地生成了我们的数据，我们构建了我们的暹罗网络。首先，我们定义基本网络，它基本上是一个用于特征提取的卷积网络。我们构建了两个卷积层，其中包含ReLU激活和最大池化，然后是flat层：

def build_base_network(input_shape):
 
 seq = Sequential()
 
 nb_filter = [6, 12]
 kernel_size = 3
 
 
 #convolutional layer 1
 seq.add(Convolution2D(nb_filter[0], kernel_size, kernel_size, input_shape=input_shape,
 border_mode=‘valid‘, dim_ordering=‘th‘))
 seq.add(Activation(‘relu‘))
 seq.add(MaxPooling2D(pool_size=(2, 2))) 
 seq.add(Dropout(.25))
 
 #convolutional layer 2
 seq.add(Convolution2D(nb_filter[1], kernel_size, kernel_size, border_mode=‘valid‘, dim_ordering=‘th‘))
 seq.add(Activation(‘relu‘))
 seq.add(MaxPooling2D(pool_size=(2, 2), dim_ordering=‘th‘)) 
 seq.add(Dropout(.25))
 #flatten 
 seq.add(Flatten())
 seq.add(Dense(128, activation=‘relu‘))
 seq.add(Dropout(0.1))
 seq.add(Dense(50, activation=‘relu‘))
 return seq

接下来，我们将图像对提供给基础网络，基础网络将返回嵌入，即特征向量：

input_dim = x_train.shape[2:]
img_a = Input(shape=input_dim)
img_b = Input(shape=input_dim)
base_network = build_base_network(input_dim)
feat_vecs_a = base_network(img_a)
feat_vecs_b = base_network(img_b)

feat_vecs_a和feat_vecs_b是我们图像对的特征向量。接下来，我们将这些特征向量提供给能量函数以计算它们之间的距离，我们使用欧氏距离作为我们的能量函数：

def euclidean_distance(vects):
 x, y = vects
 return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))
def eucl_dist_output_shape(shapes):
 shape1, shape2 = shapes
 return (shape1[0], 1)
distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([feat_vecs_a, feat_vecs_b])

现在，我们将epochs设置为13，我们使用RMS prop进行优化并定义我们的模型：

epochs = 13
rms = RMSprop()
model = Model(input=[input_a, input_b], output=distance)

接下来，我们将loss函数定义为contrastive_loss函数，并编译机器学习模型:

def contrastive_loss(y_true, y_pred):
 margin = 1
 return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))
model.compile(loss=contrastive_loss, optimizer=rms)

现在，我们训练我们的机器学习模型：

img_1 = x_train[:, 0]
img_2 = x_train[:, 1] 
model.fit([img_1, img_2], y_train, validation_split=.25, batch_size=128, verbose=2, nb_epoch=epochs)

现在，我们使用测试数据进行预测：

pred = model.predict([x_test[:, 0], x_test[:, 1]])

接下来，我们定义一个计算准确度的函数：

def compute_accuracy(predictions, labels):
 return labels[predictions.ravel() < 0.5].mean()

机器学习模型的准确性：

compute_accuracy(pred, y_test)

0.9779092702169625

以上是关于使用暹罗{（Xiānluó），泰国的旧称 one-shot} 网络进行人脸识别的主要内容，如果未能解决你的问题，请参考以下文章

《Siamese network 孪生神经网络--一个简单神奇的结构》