[PYTHON] Now, let's try face recognition with Chainer (prediction phase)

Overview

Let's try face recognition with Chainer (learning phase), this time it is the prediction phase.

Let's try face recognition using a USB-connected webcam.

environment

-Software- Windows 10 Home Anaconda3 64-bit(Python3.7) Spyder -Library- Chainer 7.0.0 opencv-python 4.1.2.30 -Hardware- CPU: Intel core i9 9900K GPU: NVIDIA GeForce RTX2080ti RAM: 16GB 3200MHz (It can be executed even on a PC if you have a webcam)

reference

** Books ** OpenCV4 programming starting with Python Naohiro Kitayama (Author) ([Amazon Page](https://www.amazon.co.jp/Python%E3%81%A7%E5%A7%8B%E3%82%81%E3%82%8BOpenCV-4%E3%83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0-% E5% 8C% 97% E5% B1% B1 -% E7% 9B% B4% E6% B4% 8B / dp / 4877834613)) site Chainer API Reference

program

For the time being, I will post it on Github. https://github.com/himazin331/Face-Recognition-Chainer- The repository contains a learning phase, a prediction phase, a data processing program, and Haar-Cascade.

Premise

A Cascade file with Haar-Like features is required for the operation of this program. This time I will use Haar-Cascade of OpenCV. Cascade is included in the repository, so you don't need to prepare it separately.

Source code

** Please note that the code is dirty ... **

`face_recog_CH.py`


from PIL import Image
import numpy as np
import cv2

import sys
import os
import argparse as arg

import chainer
import chainer.links as L
import chainer.functions as F
import chainer.serializers as S
  
# ==================================== face_recog_train_CH.Same network configuration as py====================================
class CNN(chainer.Chain):
    def __init__(self, n_out):
        super(CNN, self).__init__(
            conv1=L.Convolution2D(1, 16, 5, 1, 0),  
            conv2=L.Convolution2D(16, 32, 5, 1, 0),  
            conv3=L.Convolution2D(32, 64, 5, 1, 0),  
            link=L.Linear(None, 1024),  
            link_class=L.Linear(None, n_out),  
        )
    def __call__(self, x):
        h1 = F.max_pooling_2d(F.relu(self.conv1(x)), ksize=2)
        h2 = F.max_pooling_2d(F.relu(self.conv2(h1)), ksize=2)
        h3 = F.relu(self.conv3(h2))
        h4 = F.relu(self.link(h3))
        return self.link_class(h4)
# ================================================================================================================

def main():

    #Command line optional arguments
    parser = arg.ArgumentParser(description='Face Recognition Program(Chainer)')
    parser.add_argument('--param', '-p', type=str, default=None,
                        help='Specifying learned parameters(Error if not specified)')
    parser.add_argument('--cascade', '-c', type=str, default=os.path.dirname(os.path.abspath(__file__))+'/haar_cascade.xml'.replace('/', os.sep),
                        help='Haar-specification of cascade(Default value=./haar_cascade.xml)')
    parser.add_argument('--device', '-d', type=int, default=0,
                        help='Specifying the camera device ID(Default value=0)')
    args = parser.parse_args()

    #When parameter file is not specified->exception
    if args.param == None:
        print("\nException: Trained Parameter-File not specified.\n")
        sys.exit()
    #When a parameter file that does not exist is specified->exception
    if os.path.exists(args.param) != True:
        print("\nException: Trained Parameter-File {} is not found.\n".format(args.param))
        sys.exit()
    #Haar that does not exist-When cascade is specified->exception
    if os.path.exists(args.cascade) != True:
        print("\nException: Haar-cascade {} is not found.\n".format(args.cascade))
        sys.exit()

    #Setting information output
    print("=== Setting information ===")
    print("# Trained Prameter-File: {}".format(os.path.abspath(args.param)))
    print("# Haar-cascade: {}".format(args.cascade))
    print("# Camera device: {}".format(args.device))
    print("===========================")

    #Camera instance generation
    cap = cv2.VideoCapture(args.device)
    #FPS value setting
    cap.set(cv2.CAP_PROP_FPS, 60)
    
    #Set of face detectors
    detector = cv2.CascadeClassifier(args.cascade)

    #Loading learned parameters
    model = L.Classifier(CNN(2))
    S.load_npz(args.param, model)

    red = (0, 0, 255)
    green = (0, 255, 0)
    p = (10, 30)
    
    while True:

        #Get frame
        _, frame = cap.read()

        #Camera recognition not possible->exception
        if _ == False:
            print("\nException: Camera read failure.\n".format(args.param))
            sys.exit()

        #Face detection
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        faces = detector.detectMultiScale(gray)
 
        #Face undetected->continue
        if len(faces) == 0:

            cv2.putText(frame, "face is not found",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, red, thickness=2)
            cv2.imshow("frame", frame)
            
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

            continue
        
        #When face is detected
        for (x, y, h, w) in faces:
            
            #Face area display
            cv2.rectangle(frame, (x, y), (x+w, y+h), red, thickness=2) 
            
            #Through if the face is too small
            if h < 50 and w < 50:
                cv2.putText(frame, "detected face is too small",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, red, thickness=2)
                cv2.imshow("frame", frame)
                break
                
            #Show detected face
            cv2.imshow("gray", cv2.resize(gray[y:y + h, x:x + w], (250, 250)))
                    
            #Image processing
            face = gray[y:y + h, x:x + w]
            face = Image.fromarray(face)
            face = np.asarray(face.resize((32, 32)), dtype=np.float32)
            recog_img = face[np.newaxis, :, :]
                    
            #Face recognition
            y = model.predictor(chainer.Variable(np.array([recog_img])))
            c = F.softmax(y).data.argmax()
            
            if c == 0:
                cv2.putText(frame, "Unknown",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, green, thickness=2)     
            elif c == 1:
                cv2.putText(frame, "Kohayakawa",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, green, thickness=2)    
                
            cv2.imshow("frame", frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
  
    
    #Resource release
    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Execution result

This time, I tried to identify Shinzo Abe and Taro Aso. The number of learning data is 100 for both.

command

python face_recog_CH.py -p <parameter file> -c <cascade> (-d <camera device ID>)

Description

Since it is a prediction phase, it is a program that identifies faces using a camera.

Network model

Although it is a CNN class, it has exactly the same network model as the learning phase (face_recog_train_CH.py). Please describe as it is. It will not work if the structure is slightly different. Since different hyperparameters and layers have different numbers of parameters such as weights, the parameters optimized by learning are used. Cannot be applied.

`CNN class`


# ==================================== face_recog_train_CH.Same network configuration as py====================================
class CNN(chainer.Chain):
    def __init__(self, n_out):
        super(CNN, self).__init__(
            conv1=L.Convolution2D(1, 16, 5, 1, 0),  
            conv2=L.Convolution2D(16, 32, 5, 1, 0),  
            conv3=L.Convolution2D(32, 64, 5, 1, 0),  
            link=L.Linear(None, 1024),  
            link_class=L.Linear(None, n_out),  
        )
    def __call__(self, x):
        h1 = F.max_pooling_2d(F.relu(self.conv1(x)), ksize=2)
        h2 = F.max_pooling_2d(F.relu(self.conv2(h1)), ksize=2)
        h3 = F.relu(self.conv3(h2))
        h4 = F.relu(self.link(h3))
        return self.link_class(h4)
# ================================================================================================================

setup

It creates a camera instance, loads a cascade, and imports parameters.

    #Camera instance generation
    cap = cv2.VideoCapture(args.device)
    #FPS value setting
    cap.set(cv2.CAP_PROP_FPS, 60)
    
    #Set of face detectors
    detector = cv2.CascadeClassifier(args.cascade)

Apply parameters to the network model with chainer.serializers.load_npz (). Note that in the learning phase I wrapped the model with L.Classifier () and created an instance, You need to wrap the model in L.Classifier () as well in the prediction phase.

    #Loading learned parameters
    model = L.Classifier(CNN(2))
    S.load_npz(args.param, model)

Face recognition

First, take a picture with the camera. You can shoot with cap.read (). If you execute cap.read () once, you will get one still image. Execute cap.read () sequentially using a while statement or for statement, and output the obtained still images continuously to make it look like it is moving.

cap.read () returns two values. The first is a flag (_ in the code) whether or not shooting is possible. The second still image actually taken (frame in the code).

Hereinafter, the still image is referred to as a frame.

    while True:

        #Get frame
        _, frame = cap.read()

        #Camera recognition not possible->exception
        if _ == False:
            print("\nException: Camera read failure.\n".format(args.param))
            sys.exit()

        #Face detection
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        faces = detector.detectMultiScale(gray)
 
        #Face undetected->continue
        if len(faces) == 0:

            cv2.putText(frame, "face is not found",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, red, thickness=2)
            cv2.imshow("frame", frame)
            
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

            continue

After acquiring the frame, the frame is grayscaled and face detection is performed using Cascade, which is a Haar-Like feature. detector.detectMultiScale () returns the detection position information (coordinates and width height) when a face is detected, and if it cannot be detected, I will not return anything.

When a face cannot be detected, "face is not found" is output on the window and continue.

I will explain the processing when a face is detected.

Image processing is performed using the x-coordinate and y-coordinate, width and height of the returned detection location.

        #When face is detected
        for (x, y, h, w) in faces:
            
            #Face area display
            cv2.rectangle(frame, (x, y), (x+w, y+h), red, thickness=2) 
            
            #Through if the face is too small
            if h < 50 and w < 50:
                cv2.putText(frame, "detected face is too small",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, red, thickness=2)
                cv2.imshow("frame", frame)
                break
                
            #Show detected face
            cv2.imshow("gray", cv2.resize(gray[y:y + h, x:x + w], (250, 250)))
                    
            #Image processing
            face = gray[y:y + h, x:x + w]
            face = Image.fromarray(face)
            face = np.asarray(face.resize((32, 32)), dtype=np.float32)
            recog_img = face[np.newaxis, :, :]
                    
            #Face recognition
            y = model.predictor(chainer.Variable(np.array([recog_img])))
            c = F.softmax(y).data.argmax()
            
            if c == 0:
                cv2.putText(frame, "Abe Sinzo",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, green, thickness=2)     
            elif c == 1:
                cv2.putText(frame, "Aso Taro",
                    p, cv2.FONT_HERSHEY_SIMPLEX, 1.0, green, thickness=2)    
                
            cv2.imshow("frame", frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

Image processing is concrete ① Cut out the face area from the frame (2) Convert from an array to an image once to resize the face area ③ Resize to 32px x 32px ④ Add the dimension of the array (addition of the number of channels, [number of channels, height, width]) I am doing.

Now, if we can process the data into a form that can be identified, it is finally time to recognize the face. y = model.predictor(chainer.Variable(np.array([recog_img]))) Start the forecast with. chainer.Variable () is a function that associates data with a chain rule. Next, use c = F.softmax (y) .data.argmax () to pass the prediction result through the softmax function, and then use argmax. The largest element (index) is returned.

And finally, using the if statement, the class name corresponding to the element (index) is output.

This time, there are two classes, Shinzo Abe and Taro Aso, but let's learn a face that is neither. When you input a face that is neither Shinzo Abe nor Taro Aso, it is possible to output something like "neither".

in conclusion

Originally, these programs were developed in high school subject research (graduation research), so the code is appropriate. You can easily change the number of classes to be classified as long as you have learning data, so I hope you can do whatever you want.