如何修改scikit-learn的eigenface人脸识别示例

Posted 2023-03-12

技术标签:

【中文标题】如何修改scikit-learn的eigenface人脸识别示例【英文标题】：How to modify scikit-learn's eigenface face recognition example 【发布时间】：2016-08-28 19:11:57 【问题描述】：

我正在尝试调整 scikit-learn's eigenface face recognition script 以用于我自己的图像数据集（请注意，此脚本在我的 Python 3 sklearn 0.17 上完美运行）。

下面对fetch_lfw_people() 的调用可能需要修改，我一直在努力让脚本跳过它以指向我自己的图像文件夹。

我想要脚本——而不是从它下载的文件夹中提取数据——从我自己位于 '/User/pepe/images/' 的数据集中获取图像。

# Download the data, if not already on disk and load it as numpy arrays

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = lfw_people.images.shape

# for machine learning we use the 2 data directly (as relative pixel
# positions info is ignored by this model)
X = lfw_people.data
n_features = X.shape[1]

# the label to predict is the id of the person
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

etc...

您对如何解决这个问题有什么建议吗？

从 GitHub 代码可以看出，中心部分其实不是fetch_lfw_people() 本身，而是有附加功能的lfw.py 文件。

【问题讨论】：

您是否使用自己的数据集进行了这项工作？ 【参考方案1】：

您无需“修改”任何内容，该函数为此提供了一种简单的方法。

见：

https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/datasets/lfw.py#L229

参数data_home（在此处给出您的路径！）和download_if_missing（取消设置，即为其提供False 值）正是为此目的！

【讨论】：

【参考方案2】：

我可以将其修改为以下代码，但我无法计算分数。我可以阅读图像，并将其与示例图像进行比较。我不知道如何使用记分器功能。

from time import time
import numpy, os
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC
from PIL import Image

#Path to the root image directory containing sub-directories of images
path="<Path to Folder of Training Images>"
testImage = "<Path to test image>"

#Flat image Feature Vector
X=[]
#Int array of Label Vector
Y=[]

n_sample = 0 #Total number of Images
h = 750 #Height of image in float
w = 250 #Width of image in float 
n_features = 187500 #Length of feature vector
target_names = [] #Array to store the names of the persons
label_count = 0
n_classes = 0

for directory in os.listdir(path):
    for file in os.listdir(path+directory):
        print(path+directory+"/"+file)
        img=Image.open(path+directory+"/"+file)
        featurevector=numpy.array(img).flatten()
        print len(featurevector)
        X.append(featurevector)
        Y.append(label_count)
        n_sample = n_sample + 1
    target_names.append(directory)
    label_count=label_count+1

print Y
print target_names
n_classes = len(target_names)

###############################################################################
# Split into a training set and a test set using a stratified k fold

# split into a training and teststing set
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.25, random_state=42)

###############################################################################
# Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled
# dataset): unsupervised feature extraction / dimensionality reduction
n_components = 10

print("Extracting the top %d eigenfaces from %d faces"
      % (n_components, len(X_test)))
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0))

eigenfaces = pca.components_.reshape((n_components, h, w))

print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))

###############################################################################
# Train a SVM classification model
print("Fitting the classifier to the training set")
t0 = time()
param_grid = 'C': [1e3, 5e3, 1e4, 5e4, 1e5],
              'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], 
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid)
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)

###############################################################################
# Quantitative evaluation of the model quality on the test set

print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print clf.score(X_test_pca,y_test)
print("done in %0.3fs" % (time() - t0))
print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))

###############################################################################
# Prediction of user based on the model
test = []
testImage=Image.open(testImage)
testImageFeatureVector=numpy.array(testImage).flatten()
test.append(testImageFeatureVector)
testImagePCA = pca.transform(test)
testImagePredict=clf.predict(testImagePCA)
#print clf.score(testImagePCA)
#print clf.score(X_train_pca,testImagePCA)
#print clf.best_params_
#print clf.best_score_
#print testImagePredict
print target_names[testImagePredict[0]]

【讨论】：

以上是关于如何修改scikit-learn的eigenface人脸识别示例的主要内容，如果未能解决你的问题，请参考以下文章