[PYTHON] Unsupervised smile detection using One Class SVM

Overview

As a programming practice, I decided to implement something and decided to detect smiles. Deep Learning is making a lot of noise today, but it is implemented in a fairly old-fashioned way. Since it is an appropriate system that I made with my own ideas, I would appreciate it if you could comment if you have any suggestions.

The flow of smile detection is as follows.

  1. Get an image with a WEB camera.
  2. Detect the face with the OpenCV cascade classifier and cut out the range.
  3. Get the HOG features of the cropped image.
  4. Use One Class SVM to determine "smile (normal state)" or "not smile (abnormal state)".

To use the cascade classifier, you will need haarcascade_frontalface_alt.xml from here, so please clone it.

About One Class SVM

One Class SVM is often used for problems where teacher data is difficult to collect. This time, I will use it as an anomaly detection problem. In other words, the state of smiling is defined as "normal", and the other states are defined as "abnormal".

In a normal SVM (Support Vector Machine), supervised learning is performed to find the identification boundary. So, speaking of the smile detection problem, it is an image of learning data such as "true face" in addition to "smile". One Class SVM, on the other hand, performs unsupervised learning to determine whether it is "normal" or "abnormal." Therefore, if you have the learning data of "smile", you can distinguish "smile" from "other than that".

Implementation

There are three phases.

  1. Collect learning data (“smile (normal state)”). (Data_collect ())
  2. Learn One Class SVM using the collected data. (Train ())
  3. Detect smiles using the trained One Class SVM. (Main ())

main.py


import numpy as np
import cv2
from skimage.feature import hog
from sklearn.svm import OneClassSVM
from sklearn.decomposition import PCA
import pickle

n = 3
n_dim = 4
alpha = - 1.0e+6
th = 20 #3
nu = 0.2 #0.1 #Percentage of outliers in input data
font = cv2.FONT_HERSHEY_COMPLEX
train_data = "./dataset/train/train.csv"
weights = "./dataset/weights/weights.sav"
weights_pca = "./dataset/weights/weights_pca.sav"

f_ = cv2.CascadeClassifier()  # "./cascades/haarcascade_fullbody.xml"
f_.load(cv2.samples.findFile("./cascades/haarcascade_frontalface_alt.xml"))

def preprocess(image):
    frame = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    frame = cv2.equalizeHist(frame)
    return frame

def data_collect():
    feature = []
    capture = cv2.VideoCapture(0)

    while (True):
        ret, frame = capture.read()
        frame = preprocess(frame)
        face = f_.detectMultiScale(frame)  # ,scaleFactor=1.2

        for rect in face:
            cv2.rectangle(frame, tuple(rect[0:2]), tuple(rect[0:2] + rect[2:4]), (255, 255, 0), thickness=2)
            face_frame = frame[rect[1]:rect[1] + rect[3], rect[0]:rect[0] + rect[2]]
            face_frame = cv2.resize(face_frame, (60, 60))
            hog_f_, im = hog(face_frame, visualise=True,transform_sqrt=True)
            feature = np.append(feature,hog_f_)
            np.savetxt(train_data,feature.reshape(-1,2025), delimiter=",")
            cv2.putText(frame, "please smile for collecting data!", (10, 100), font,
                             1, (255, 255, 0), 1, cv2.LINE_AA)
        cv2.waitKey(1)
        cv2.imshow("face", frame)

def train():
    x_train = np.loadtxt(train_data,delimiter=",")
    pca = PCA(n_components=n_dim)
    clf = OneClassSVM(nu=nu, gamma=40/n_dim)#1/n_dim
    z_train = pca.fit_transform(x_train)
    clf.fit(z_train)

    pickle.dump(pca, open(weights_pca, "wb"))
    pickle.dump(clf,open(weights,"wb"))

def main():
    clf = pickle.load(open(weights,"rb"))
    pca = pickle.load(open(weights_pca, "rb"))
    capture = cv2.VideoCapture(0)

    while(True):
        ret,frame = capture.read()
        frame = preprocess(frame)
        face = f_.detectMultiScale(frame)

        for rect in face:
            cv2.rectangle(frame,tuple(rect[0:2]),tuple(rect[0:2]+rect[2:4]),(255,255,0),thickness=2)
            face_frame = frame[rect[1]:rect[1]+rect[3],rect[0]:rect[0]+rect[2]]
            face_frame = cv2.resize(face_frame,(60,60))
            feature , _ = hog(face_frame,visualise=True,transform_sqrt=True)
            z_feature = pca.transform(feature.reshape(1,2025))
            score = clf.predict(z_feature.reshape(1,n_dim))
            if score[0]== 1:
                cv2.putText(frame, "smile!", (10, 100), font,
                             1, (255, 255, 0), 1, cv2.LINE_AA)
        cv2.waitKey(1)
        cv2.imshow("face",frame)#Rainy day v,Specified by u

if __name__ == '__main__':
    data_collect() #First, collect smile data by turning only with this function
    #train() #Smile data(csv)Read and learn svm
    #main() #Smile detection with the learned model

(Please refer to here for the exact implementation. One Class SVM used the scikit-learn implementation. When actually using it, create a directory as follows. I would appreciate it if you could point out any mistakes in the code. OneClassSVM/  ├ dataset/  │ └ train/  │ └ weights/  ├ cascades/    └ main.py

Implementation supplement

After cutting out the face image with a cascade classifier, it is resized to 60x60. (The size was decided appropriately.) Before learning SVM, PCA is reducing the dimension of HOG features. The reason for this is that I was worried about the curse because the dimension of HOG exceeds 2000. You can specify the number of dimensions after reduction with n_dim. This time it was set to 4. Also, in learning One Class SVM, it took some time to adjust the hyperparameters. There are nu and gamma in the fit argument of OneClass SVM. It seems that nu is the rate of abnormality in the training data, and gamma is the number of dimensions of 1 / feature quantity. However, since there are no abnormal values in the training data in the first place, as a result of worrying about what to do, I ended up looking for a good parameter while turning main () and doing train (). Finally, nu = 0.3, gamma = 50 / feature quantity dimension number. You may need to adjust nu and gamma when using it.

Experiment

I experimented by turning main (). The accuracy when the face is facing the front is reasonable (about 80% perceived), but since I put a smile level smile in the learning data, it became difficult to learn the boundary between the true face and the smile. It may have been done.

References

Face detection by OpenCV Haar-Cascade Abnormal value detection using One Class SVM

Recommended Posts

Unsupervised smile detection using One Class SVM
Outlier detection using One Class SVM
One Class SVM implementation