[PYTHON] Face recognition using principal component analysis

Introduction

I tried a face recognition program that uses principal component analysis. The program is python and the principal component analysis library is scikit-learn. If the face image is this paper, the image of the old man is not interesting, so I used the face image of Kanna Hashimoto. use.

What is principal component analysis?

There are many sites and books with detailed explanations, so I will omit them. Simply put, it is a type of data analysis method that extracts the features of the data by reducing the dimensions of the multidimensional feature space to a low-dimensional subspace. Note that if the target is an image, it is a two-dimensional matrix with N × N (pixel) pixel values as elements, but it is represented as a $ N ^ 2 $ -dimensional feature vector. When applied to image matching such as face recognition, the correlation value (similarity) with the main component of the template is calculated, and the one with the highest correlation is output as the correct answer.

Implementation of face recognition

theory

Based on this paper.

First, prepare n face data for learning. It is assumed that each image is resized to N × N.

X = \{\vec{x}_1,\vec{x}_2,\cdots,\vec{x}_i,\cdots,\vec{x}_n\}

Here, $ \ vec {x_i} $ represents an N × N image as a feature vector of $ N ^ 2 $ dimension.

From there, the average value

\vec{\mu} = \frac{1}{n}\sum_{i=1}^{n}\vec{x}_i

And the covariance matrix

S = \sum_{i=1}^{n}\sum_{j=1}^{n}(\vec{\mu}-\vec{x}_i)(\vec{\mu}-\vec{x}_j)^T

To get. From there the eigenvalue problem

S\vec{v}=\lambda\vec{v}

By solving, we get the eigenvalue $ \ lambda_j $ and the eigenvector $ \ vec {v} _j $. By arranging the eigenvectors in descending order of the corresponding eigenvalues, the eigenvectors become the first principal component, the second principal component, and so on.

Face recognition is performed by calculating the correlation value between the target image and the learned principal component. The correlation value is a projection matrix in which eigenvectors (principal component vectors) are arranged side by side.

V = \{\vec{v}_1,\vec{v}_2,\cdots,\vec{v}_d\}

Is obtained by the inner product with the feature vector $ X_ {obs} $ of the target image using. Here, with $ d = 1 $, the correlation value is obtained from the first principal component. That is, the correlation value $ R $ is

R = \vec{v}_1\cdot X_{obs}^T

Can be calculated.

Implementation

First, prepare a face image of Kanna Hashimoto for learning. Originally, various images are registered, but in the case of face recognition, it is severe and severe unless there are many images taken from various angles. For the time being, this time it's just a practice, so I copied and registered the following 5 images.

kanna8.png

Next, prepare various images for practice. Handwritten character image (3), Kanna Hashimoto (original (org) and another image (kanna_2) that is a little similar), Tetsuro Degawa (degawa), Mona Lisa (MN), a total of 5

pcatest.png

code

import os
from glob import glob
import numpy as np
import sys
from sklearn.decomposition import PCA
import cv2

SIZE = 64

# GRAYSCALE
def Image_PCA_Analysis(d, X):

    #Principal component analysis
    pca = PCA(n_components=d)
    pca.fit(X)

    print("Main component")
    print(pca.components_)
    print("average")
    print(pca.mean_)
    print("Covariance matrix")
    print(pca.get_covariance())
    print("Eigenvalues of the covariance matrix")
    print(pca.explained_variance_)
    print("The eigenvectors of the covariance matrix")
    v = pca.components_
    print(v)
    #Principal component analysis results
    print("Dispersion explanation rate of main components")
    print(pca.explained_variance_ratio_)
    print("Cumulative contribution rate")
    c_contribute_ratio = pca.explained_variance_ratio_.sum()
    print(c_contribute_ratio)

    #Dimensionality reduction and restoration
    X_trans = pca.transform(X)
    X_inv = pca.inverse_transform(X_trans)
    print('X.shape =', X.shape)
    print('X_trans.shape =', X_trans.shape)
    print('X_inv.shape =', X_inv.shape)
    for i in range(X_inv.shape[0]):
        cv2.imshow("gray", X_inv[i].reshape(SIZE,SIZE))
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    return v,c_contribute_ratio

def img_read(path):

    x = []
    files = glob(path)
    for file in files:
        img = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
        img2 = cv2.resize(img, (SIZE, SIZE)) # N*Resize to N
        x.append(img2)
    X = np.array(x)
    X = X.reshape(X.shape[0],SIZE*SIZE) #Feature vector x n
    X = X / 255.0 # 0-I have to set it to 1, so normalize
    print(X.shape)

    return X

def main():

    #Principal component analysis dimension number
    d = 5

    path = './kanna/*.png'
    X = img_read(path)

    # PCA
    v, c_contribute_ratio = Image_PCA_Analysis(d, X)

    #matching
    path = './kanna2/*.png'
    files = glob(path)
    for file in files:
        X2 = img_read(file)
        X2 = X2 / np.linalg.norm(X2)
        #Correlation value (product of projection matrix and feature vector created using d-dimensional eigenvectors)
        eta = np.dot(v[0],X2.T)
        print("Correlation value:", file, np.linalg.norm(eta * 255))

    return


Since the average brightness of the target test images (5) is different, the norm of the feature vector is standardized to 1.

result

As expected, the correlation value of Kanna Hashimoto was high, and the correlation value of Mona Lisa and Degawa was low. Degawa was lower than the handwritten character "3", and the handwritten character "3" was closer to the face of Kanna Hashimoto than Degawa.

Correlation value: ./kanna2\3.png 21.292788187030233
Correlation value: ./kanna2\degawa.png 14.11580341763399
Correlation value: ./kanna2\kanna_2.png 32.536060418259474
Correlation value: ./kanna2\kanna_org.png 39.014994579329326
 Correlation value: ./kanna2 \ MN.png 26.90538714456287```

It is natural because many of the same images are registered, but it can be almost explained only by the first principal component, and the cumulative contribution rate is 100.%

 Dispersion explanation rate of main components
[1.00000000e+00 3.15539405e-32 0.00000000e+00 0.00000000e+00
 0.00000000e+00]
 Cumulative contribution rate
1.0

As a qualitative understanding of the correlation value, as shown in the figure below, first, the data of the training image is considered to be projected on the principal component axis (dimensional compression). By taking the inner product of the principal component vector and the target image (feature vector), the value of the inner product is larger for the image close to the direction of the principal component vector.

PCA.png

Recommended Posts

Face recognition using principal component analysis
Principal component analysis
Principal component analysis (Principal component analysis: PCA)
I tried face recognition using Face ++
Unsupervised learning 3 Principal component analysis
Principal component analysis Analyze handwritten numbers using PCA. Part 2
Principal component analysis using python from nim with nimpy
Principal component analysis Analyze handwritten numbers using PCA. Part 1
Principal component analysis with Spark ML
Python: Unsupervised Learning: Principal Component Analysis
Face recognition using OpenCV (Haar-like feature classifier)
Principal Component Analysis with Livedoor News Corpus-Practice-
Principal component analysis with Power BI + Python
<Course> Machine learning Chapter 4: Principal component analysis
Principal component analysis with Livedoor News Corpus --Preparation--
Dimensional compression with self-encoder and principal component analysis
I tried principal component analysis with Titanic data!
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
Robot grip position (Python PCA principal component analysis)
Data analysis using xarray
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
Collaborative filtering with principal component analysis and K-means clustering
Data analysis using Python 0
Mathematical understanding of principal component analysis from the beginning
Clustering and principal component analysis by K-means method (beginner)
Challenge principal component analysis of text data with Python
Principal component analysis (PCA) and independent component analysis (ICA) in python
Orthologous analysis using OrthoFinder
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
I tried face recognition of the laughter problem using Keras.
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Visualize the correlation matrix by principal component analysis in Python
Similar face image detection using face recognition and PCA and K-means clustering
Principal component analysis hands-on with PyCaret [normalization + visualization (plotly)] memo
Face recognition with Python's OpenCV
Age recognition using Pepper's API
Face recognition with Amazon Rekognition
Implementation of independent component analysis
Perform handwriting recognition using Pylearn2
Object co-localization for face recognition
Face recognition / cutting with OpenCV
Try face recognition with Python
Outline the face using Dlib (1)
Data analysis using python pandas
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
[GWAS] Plot the results of principal component analysis (PCA) by PLINK
100 Language Processing Knock-85 (Truncated SVD): Dimensional compression by principal component analysis