Head orientation estimation using Python and OpenCV + dlib

Head direction estimation?

Head direction estimation, Head Pose Optimization in English. It is an algorithm that estimates the direction in which the face is facing and the inclination of the head from the input image information and facial feature data. Recently, it is widely used for Vtuber development.

Head orientation estimation method

Qiita has already introduced several methods for estimating the head direction. This is very well summarized in the Qiita article. Investigating face orientation estimation

I think this is the article you are referring to about the head estimation method using Python and OpenCV + dlib. Head Pose Estimation using OpenCV and Dlib

The face orientation algorithms are described in great detail in the How do pose estimation algorithms work? Section of this page.

An example of a program

For the time being, I will write the program introduced in the article. You can download the dat file for face recognition from here. [dlib.net] 68 points learned data for face recognition [DL]

Module loading

HeadPoseEstimation.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np

I'm importing OpenCV for image processing, dlib for image recognition, and imutils as an aid to displaying on the screen.

Camera and face detector settings

HeadPoseEstimation.py


DEVICE_ID = 0 #Camera ID 0 to use is a standard webcam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = ",,,/shape_predictor_68_face_landmarks.dat"
#Copy and paste the path of the learned dat file

detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face

See here for detailed dlib functions. dlib documentation

Contents of head direction estimation

It acquires one frame at a time from the camera and processes it.

HeadPoseEstimation.py


while(True): #Get images continuously from the camera
    ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
    
    frame = imutils.resize(frame, width=1000) #Adjust the display size of the frame image
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
    rects = detector(gray, 0) #Detect face from gray
    image_points = None
     
    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        
        for (x, y) in shape: #Plot 68 landmarks on the entire face
            cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)

        image_points = np.array([
                tuple(shape[30]),#Nose tip
                tuple(shape[21]),
                tuple(shape[22]),
                tuple(shape[39]),
                tuple(shape[42]),
                tuple(shape[31]),
                tuple(shape[35]),
                tuple(shape[48]),
                tuple(shape[54]),
                tuple(shape[57]),
                tuple(shape[8]),
                ],dtype='double')
    
    if len(rects) > 0:
        cv2.FONT_HERSHEY_PLAIN, 0.7, (0, 0, 255), 2)
        model_points = np.array([
                (0.0,0.0,0.0), # 30
                (-30.0,-125.0,-30.0), # 21
                (30.0,-125.0,-30.0), # 22
                (-60.0,-70.0,-60.0), # 39
                (60.0,-70.0,-60.0), # 42
                (-40.0,40.0,-50.0), # 31
                (40.0,40.0,-50.0), # 35
                (-70.0,130.0,-100.0), # 48
                (70.0,130.0,-100.0), # 54
                (0.0,158.0,-10.0), # 57
                (0.0,250.0,-50.0) # 8
                ])

        size = frame.shape

        focal_length = size[1]
        center = (size[1] // 2, size[0] // 2) #Face center coordinates

        camera_matrix = np.array([
            [focal_length, 0, center[0]],
            [0, focal_length, center[1]],
            [0, 0, 1]
        ], dtype='double')

        dist_coeffs = np.zeros((4, 1))

        (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
                                                                      dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
        #Rotation matrix and Jacobian
        (rotation_matrix, jacobian) = cv2.Rodrigues(rotation_vector)
        mat = np.hstack((rotation_matrix, translation_vector))

        #yaw,pitch,Take out roll
        (_, _, _, _, _, _, eulerAngles) = cv2.decomposeProjectionMatrix(mat)
        yaw = eulerAngles[1]
        pitch = eulerAngles[0]
        roll = eulerAngles[2]
        
        print("yaw",int(yaw),"pitch",int(pitch),"roll",int(roll))#Extraction of head posture data

        cv2.putText(frame, 'yaw : ' + str(int(yaw)), (20, 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'pitch : ' + str(int(pitch)), (20, 25), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'roll : ' + str(int(roll)), (20, 40), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)

        (nose_end_point2D, _) = cv2.projectPoints(np.array([(0.0, 0.0, 500.0)]), rotation_vector,
                                                         translation_vector, camera_matrix, dist_coeffs)
        #Plot of points used in the calculation/Display of face direction vector
        for p in image_points:
            cv2.drawMarker(frame, (int(p[0]), int(p[1])),  (0.0, 1.409845, 255),markerType=cv2.MARKER_CROSS, thickness=1)

        p1 = (int(image_points[0][0]), int(image_points[0][1]))
        p2 = (int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))

        cv2.arrowedLine(frame, p1, p2, (255, 0, 0), 2)
    
    cv2.imshow('frame',frame) #Display image
    if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
        break


capture.release() #Exit video capture
cv2.destroyAllWindows() #close window

If it works properly, it will be like this. スクリーンショット 2019-11-24 23.32.58.png

Parameter explanations and notes

yaw,roll,pitch The head posture parameters yaw, roll, and pitch look like this. (It's the same as an airplane) IMG_0240.jpg

Facial features to use

Please refer to here for the position of image_points defined this time. The point used this time is ・ Inside the eyebrows (22,23) ・ Inside the eyes (40,43) ・ Nose head (31) ・ Both sides of the nose (32,36) ・ Both outsides of the mouth (49,55) ・ Under the lips (58) ・ Chin (9) It is 11 points of. The algorithm can estimate the direction of the head with 5 points, but when I tried it, when the score was small, the direction of the vector at the tip of the nose turned around, so I increased the score. (Is it because the learned data is based on Westerners ...) Facial landmarks with dlib, OpenCV, and Python facial_landmarks_68markup-1024x825.jpg The more you use the points on the outside of the face, the better the accuracy will be, but if the eyebrows etc. are cut off when you turn to the side, it will cause a false judgment of the feature amount, so try to use the points in the center of the face as much as possible. <img width = "638" alt = "IMG_18D234CF6CC9-1.jpeg " src = "https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/543519/9cc12aa3-958f-0940" -9737-84a999dfacbf.jpeg ">

And the problem is model_points, that is, what should I do with the position coordinates of the parts of my face, but I defined it by force from the following program. The (x, y) coordinate data of the face with the tip of the nose as the origin will appear in the image, so please face the camera as straight as possible, extend your posture, and read it with a spirit. I'm guessing about the z coordinate. Calculate the distance from the tip of your nose to the area between your eyes and apply it to the height of your nose. Hang in there!

HPEcal.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np


#Gets a VideoCapture object
DEVICE_ID = 0 #ID 0 is standard web cam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = "/shape_predictor_68_face_landmarks.dat"

print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face

while(True): #Get images continuously from the camera
    ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
    
    frame = imutils.resize(frame, width=2000) #Adjust the display size of the frame image
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
    rects = detector(gray, 0) #Detect face from gray
    image_points = None
     
    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        #print(shape[30])#Nose coordinates
        cal = shape-shape[30]
        print("######[X,Y]#######",
              "\n point18=",cal[17],
              "\n point22=",cal[21],
              "\n point37=",cal[36],
              "\n point40=",cal[39],
              "\n point28=",cal[27],
              "\n point31=",cal[30],
              "\n point32=",cal[31],
              "\n point49=",cal[48],
              "\n point58=",cal[57],
              "\n point9=",cal[8])
        
        for (x, y) in shape: #Plot 68 landmarks on the entire face
            cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)
            cv2.putText(frame,str((x, y)-shape[30]),(x,y), cv2.FONT_HERSHEY_PLAIN, 1.0, (0, 0, 255), 2)

    
    cv2.imshow('frame',frame) #Display image
    if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
        break

capture.release() #Exit video capture
cv2.destroyAllWindows() #close window

Finally

Finally, throw the whole program and finish it. Thank you for your hard work.

program

HeadPoseEstimation.py


import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np


#Gets a VideoCapture object
DEVICE_ID = 0 #ID 0 is standard web cam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = ".../shape_predictor_68_face_landmarks.dat"

detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face

while(True): #Get images continuously from the camera
    ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
    
    frame = imutils.resize(frame, width=1000) #Adjust the display size of the frame image
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
    rects = detector(gray, 0) #Detect face from gray
    image_points = None
     
    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)

        for (x, y) in shape: #Plot 68 landmarks on the entire face
            cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)

        image_points = np.array([
                tuple(shape[30]),#Nose tip
                tuple(shape[21]),
                tuple(shape[22]),
                tuple(shape[39]),
                tuple(shape[42]),
                tuple(shape[31]),
                tuple(shape[35]),
                tuple(shape[48]),
                tuple(shape[54]),
                tuple(shape[57]),
                tuple(shape[8]),
                ],dtype='double')
    
    if len(rects) > 0:
        model_points = np.array([
                (0.0,0.0,0.0), # 30
                (-30.0,-125.0,-30.0), # 21
                (30.0,-125.0,-30.0), # 22
                (-60.0,-70.0,-60.0), # 39
                (60.0,-70.0,-60.0), # 42
                (-40.0,40.0,-50.0), # 31
                (40.0,40.0,-50.0), # 35
                (-70.0,130.0,-100.0), # 48
                (70.0,130.0,-100.0), # 54
                (0.0,158.0,-10.0), # 57
                (0.0,250.0,-50.0) # 8
                ])

        size = frame.shape

        focal_length = size[1]
        center = (size[1] // 2, size[0] // 2) #Face center coordinates

        camera_matrix = np.array([
            [focal_length, 0, center[0]],
            [0, focal_length, center[1]],
            [0, 0, 1]
        ], dtype='double')

        dist_coeffs = np.zeros((4, 1))

        (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
                                                                      dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
        #Rotation matrix and Jacobian
        (rotation_matrix, jacobian) = cv2.Rodrigues(rotation_vector)
        mat = np.hstack((rotation_matrix, translation_vector))

        #yaw,pitch,Take out roll
        (_, _, _, _, _, _, eulerAngles) = cv2.decomposeProjectionMatrix(mat)
        yaw = eulerAngles[1]
        pitch = eulerAngles[0]
        roll = eulerAngles[2]
        
        print("yaw",int(yaw),"pitch",int(pitch),"roll",int(roll))#Extraction of head posture data

        cv2.putText(frame, 'yaw : ' + str(int(yaw)), (20, 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'pitch : ' + str(int(pitch)), (20, 25), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'roll : ' + str(int(roll)), (20, 40), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)

        (nose_end_point2D, _) = cv2.projectPoints(np.array([(0.0, 0.0, 500.0)]), rotation_vector,
                                                         translation_vector, camera_matrix, dist_coeffs)
        #Plot of points used in the calculation/Display of face direction vector
        for p in image_points:
            cv2.drawMarker(frame, (int(p[0]), int(p[1])),  (0.0, 1.409845, 255),markerType=cv2.MARKER_CROSS, thickness=1)

        p1 = (int(image_points[0][0]), int(image_points[0][1]))
        p2 = (int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))

        cv2.arrowedLine(frame, p1, p2, (255, 0, 0), 2)
    
    cv2.imshow('frame',frame) #Display image
    if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
        break


capture.release() #Exit video capture
cv2.destroyAllWindows() #close window

2020/4/2 postscript

OpenCV doesn't work with Qt related errors!

I got this error recently

qt.qpa.plugin: Could not find the Qt platform plugin "cocoa" in ""
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

It seems that when I install new openCV with pip, I get this error. It works when the version is lowered.

pip3 install opencv-python==4.1.2.30

Reference summary

Qiita Investigating face orientation estimation

External

Head Pose Estimation using OpenCV and Dlib dlib documentation Facial landmarks with dlib, OpenCV, and Python

Recommended Posts

Head orientation estimation using Python and OpenCV + dlib
Get and estimate the shape of the head using Dlib and OpenCV with python
I tried object detection using Python and OpenCV
[Ubuntu] [Python] Face detection comparison between dlib and OpenCV
[Python] Using OpenCV with Python (Basic)
Using OpenCV with Python @Mac
Shoot time-lapse from a PC camera using Python and OpenCV
[Python] Accessing and cropping image pixels using OpenCV (for beginners)
Environment construction of python and opencv
Shining life with Python and OpenCV
[Ubuntu] [Python] Object tracking using dlib
[Python] Using OpenCV with Python (Image Filtering)
Neural network with OpenCV 3 and Python 3
Authentication using tweepy-User authentication and application authentication (Python)
[Python] Using OpenCV with Python (Image transformation)
[Python] Using OpenCV with Python (Edge Detection)
Clustering and visualization using Python and CytoScape
[Image processing] Edge detection using Python and OpenCV makes Poo naked!
Build and try an OpenCV & Python environment in minutes using Docker
Aligning scanned images of animated video paper using OpenCV and Python
Notes using cChardet and python3-chardet in Python 3.3.1.
From Python to using MeCab (and CaboCha)
Build Python3 and OpenCV environment on Ubuntu 18.04
Using Python and MeCab with Azure Databricks
[Ubuntu] [Python] Facial organ detection using dlib
Python dlib face detection and blink counter
Make one Mario using Numpy and OpenCV
Capturing images with Pupil, python and OpenCV
Reading and creating a mark sheet using Python OpenCV (Tips for reading well)
Easy introduction of python3 series and OpenCV3
Video processing using Python + OpenCV on Mac
I'm using tox and Python 3.3 with Travis-CI
Hello World and face detection with OpenCV 4.3 + Python
I tried web scraping using python and selenium
Notes on installing Python3 and using pip on Windows7
Develop and deploy Python APIs using Kubernetes and Docker
Python development flow using Poetry, Git and Docker
Install OpenCV 4.0 and Python 3.7 on Windows 10 with Anaconda
Create a web map using Python and GDAL
[Python3] Automatic sentence generation using janome and markovify
Try using tensorflow ① Build python environment and introduce tensorflow
Create a Mac app using py2app and Python3! !!
Feature matching with OpenCV 3 and Python 3 (A-KAZE, KNN)
cv2 functions and data types (OpenCV python bindings)
Try using ChatWork API and Qiita API in Python
Python2.7 + CentOS7 + OpenCV3
Object tracking using OpenCV3 and Python3 (tracking feature points specified by the mouse using the Lucas-Kanade method)
Start using Python
OpenCV Samples (Python)
[Note] openCV + python
Scraping using Python
Initial settings for using Python3.8 and pip on CentOS8
Searching for pixiv tags and saving illustrations using Python
Extendable skeletons for Vim using Python, Click and Jinja2
Try creating a compressed file using Python and zlib
Aggregate Git logs using Git Python and analyze associations using Orange
Author estimation using neural network and Doc2Vec (Aozora Bunko)
[Python] Easy Google Translate app using Eel and Googletrans
Try projective transformation of images using OpenCV with Python
Send and receive Gmail via the Gmail API using Python
Implementing a generator using Python> link> yield and next ()> yield