[PYTHON] I tried shooting Kamehameha using OpenPose

Introduction

I wanted to shoot Kamehameha, so I made it.

Movie_DOK.gif

You can also hide the estimated line.

Movie_NOK.gif

We use OpenPose to get the coordinates of the joint, calculate the angle from it, and judge whether the pose is like that.

About OpenPose

Setup method

I used tf pose estimation, which is the Tensorflow version of OpenPose. My environment was Windows10, GTX1070, etc., so I referred to the following page for setup. ** tf pose estimation on Windows **

I got stuck, so a note ・ It is safer to avoid long and deep paths. (It seems that swig may not work properly.)

Execution method

You can use your own webcam to operate it with the following command.

python run_webcam.py 

Exit with Esc.

Get joint coordinates

The following numbers are assigned to each joint.

For example, if you want to get the right shoulder (2): Since it is possible to acquire multiple people, we are turning humans with for.

humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
height,width = image.shape[0],image.shape[1]
for human in humans:
    body_part = human.body_parts[2]
    x = int(body_part.x * width + 0.5)
    y = int(body_part.y * height + 0.5)
    debug_info = 'x:' + str(x) + ' y:' + str(y)

Actual code

The actual code. Save it as kamehameha.py and save it in the same folder as run_webcam.py.

import argparse
import logging
import time
from pprint import pprint
import cv2
import numpy as np
import sys
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh
import math

logger = logging.getLogger('TfPoseEstimator-WebCam')
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

fps_time = 0

def find_point(pose, p):
    for point in pose:
        try:
            body_part = point.body_parts[p]
            return (int(body_part.x * width + 0.5), int(body_part.y * height + 0.5))
        except:
            return (0,0)
    return (0,0)
def euclidian( point1, point2):
    return math.sqrt((point1[0]-point2[0])**2 + (point1[1]-point2[1])**2 )
def angle_calc(p0, p1, p2 ):
    '''
        p1 is center point from where we measured angle between p0 and
    '''
    try:
        a = (p1[0]-p0[0])**2 + (p1[1]-p0[1])**2
        b = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
        c = (p2[0]-p0[0])**2 + (p2[1]-p0[1])**2
        angle = math.acos( (a+b-c) / math.sqrt(4*a*b) ) * 180/math.pi
    except:
        return 0
    return int(angle)

def accumulate_pose( a, b, c, d):
    '''
        a and b are angle between neck, left shoulder and left wrist
        c and d are angle between neck, right shoulder and right wrist
    '''
    if a in range(70,105) and b in range(100,140) and c in range(70,110) and d in range(120,180):
        return True
    return False

def release_pose( a, b, c, d):
    '''
        a and b are angle between neck, left shoulder and left wrist
        c and d are angle between neck, right shoulder and right wrist
    '''
    if a in range(140,200) and b in range(80,150) and c in range(0,40) and d in range(160,200):
        return True
    return False

def draw_str(dst, xxx_todo_changeme, s, color, scale):
    
    (x, y) = xxx_todo_changeme
    if (color[0]+color[1]+color[2]==255*3):
        cv2.putText(dst, s, (x+1, y+1), cv2.FONT_HERSHEY_PLAIN, scale, (0, 0, 0), thickness = 4, lineType=10)
    else:
        cv2.putText(dst, s, (x+1, y+1), cv2.FONT_HERSHEY_PLAIN, scale, color, thickness = 4, lineType=10)
    #cv2.line    
    cv2.putText(dst, s, (x, y), cv2.FONT_HERSHEY_PLAIN, scale, (255, 255, 255), lineType=11)
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='tf-pose-estimation realtime webcam')
    parser.add_argument('--camera', type=str, default=0)
    parser.add_argument('--resize', type=str, default='432x368',
                        help='if provided, resize images before they are processed. default=432x368, Recommends : 432x368 or 656x368 or 1312x736 ')
    parser.add_argument('--resize-out-ratio', type=float, default=4.0,
                        help='if provided, resize heatmaps before they are post-processed. default=1.0')

    parser.add_argument('--model', type=str, default='cmu', help='cmu / mobilenet_thin')
    parser.add_argument('--show-process', type=bool, default=False,
                        help='for debug purpose, if enabled, speed for inference is dropped.')
    args = parser.parse_args()
    
    print("mode 0: Normal Mode \nmode 1: Debug Mode")
    mode = int(input("Enter a mode : "))
    
    logger.debug('initialization %s : %s' % (args.model, get_graph_path(args.model)))
    w, h = model_wh(args.resize)
    if w > 0 and h > 0:
        e = TfPoseEstimator(get_graph_path(args.model), target_size=(w, h))
    else:
        e = TfPoseEstimator(get_graph_path(args.model), target_size=(432, 368))

    logger.debug('cam read+')
    cam = cv2.VideoCapture(args.camera)
    ret_val, image = cam.read()
    logger.info('cam image=%dx%d' % (image.shape[1], image.shape[0]))
    count = 0
    i = 0
    frm = 0
    y1 = [0,0]
    global height,width
    orange_color = (0,140,255)
    while True:
        ret_val, image = cam.read()
        i =1
        humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
        pose = humans
        if mode == 1:
            image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
        height,width = image.shape[0],image.shape[1]

        debug_info = ''

        if len(pose) > 0:

            # angle calcucations
            angle_l1 = angle_calc(find_point(pose, 6), find_point(pose, 5), find_point(pose, 1))
            angle_l2 = angle_calc(find_point(pose, 7), find_point(pose, 6), find_point(pose, 5))
            angle_r1 = angle_calc(find_point(pose, 3), find_point(pose, 2), find_point(pose, 1))
            angle_r2 = angle_calc(find_point(pose, 4), find_point(pose, 3), find_point(pose, 2))

            debug_info = str(angle_l1) + ',' + str(angle_l2) + ' : ' + str(angle_r1) + ',' + str(angle_r2)

            if accumulate_pose(angle_l1, angle_l2, angle_r1, angle_r2):
                logger.debug("*** accumulate Pose ***")

                # (1) create a copy of the original:
                overlay = image.copy()
                # (2) draw shapes:
                cv2.circle(overlay, (find_point(pose, 4)[0] - 40, find_point(pose, 4)[1] + 40), 40, (255, 241, 0), -1)
                # (3) blend with the original:
                opacity = 0.4
                cv2.addWeighted(overlay, opacity, image, 1 - opacity, 0, image)

            elif release_pose(angle_l1, angle_l2, angle_r1, angle_r2):
                logger.debug("*** release Pose ***")

                # (1) create a copy of the original:
                overlay = image.copy()
                # (2) draw shapes:
                cv2.circle(overlay, (find_point(pose, 7)[0] + 80 ,find_point(pose, 7)[1] - 40), 80, (255, 241, 0), -1)
                cv2.rectangle(overlay,
                              (find_point(pose, 7)[0] + 80,  find_point(pose, 7)[1] - 70),
                              (image.shape[1], find_point(pose, 7)[1] - 10),
                              (255, 241, 0), -1)
                # (3) blend with the original:
                opacity = 0.4
                cv2.addWeighted(overlay, opacity, image, 1 - opacity, 0, image)

        image= cv2.flip(image, 1)

        if mode == 1:
            draw_str(image, (20, 50), debug_info, orange_color, 2)
            cv2.putText(image,
                        "FPS: %f" % (1.0 / (time.time() - fps_time)),
                        (10, 10),  cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                        (0, 255, 0), 2)

        #image =   cv2.resize(image, (720,720))

        if(frm==0):
            out = cv2.VideoWriter('outpy.avi',cv2.VideoWriter_fourcc('M','J','P','G'), 30, (image.shape[1],image.shape[0]))
            print("Initializing")
            frm+=1
        cv2.imshow('tf-pose-estimation result', image)
        if i != 0:
            out.write(image)
        fps_time = time.time()
        if cv2.waitKey(1) == 27:
            break

    cv2.destroyAllWindows()


How to execute / end

Execute it with the following command.

python kamehameha.py 

A menu will appear. If you specify "Debug Mode", estimated line and angle information is displayed.

After the initial processing, Kamehameha will be able to shoot. Exit with Esc.

Notes

――Posture detection is a rough one. It may not be detected correctly depending on conditions such as the position of the camera and how you are facing the camera. --You can only release from the right to the left of the screen. ――It is a part of the specifications of OpenPose, but it seems that detection is better if there is a contrast between the background and the person. (Check using Debug Mode together.)

Posture detection conditions

At the top left of the "Debug Mode" screen, the angles of the right shoulder, right elbow, left shoulder, and left elbow are lined up.

If each angle meets the conditions, it is judged that you are in that posture.

"Kamehameha" "Waves"
Right shoulder 70 ~ 105 140 ~ 200
Right elbow 100 ~ 140 80 ~ 150
Left shoulder 70 ~ 110 0 ~ 40
Left elbow 120 ~ 180 160 ~ 200

If it doesn't respond well, please play with the judgment values of the accumulate_pose function and release_pose function in your code.

At the end

I wanted to do something with OpenPose, so I'm glad I did it this time. It seems that you can create a personal trainer role such as whether you are doing muscle training in the correct posture, or a gimmick that makes children happy.

Reference site

** tf pose estimation on Windows **

Recommended Posts

I tried shooting Kamehameha using OpenPose
[Python] I tried using OpenPose
I tried using parameterized
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using Random Forest
I tried using BigQuery ML
I tried using Amazon Glacier
I tried using git inspector
I tried using magenta / TensorFlow
I tried using AWS Chalice
I tried using Slack emojinator
I tried using Rotrics Dex Arm # 2
I tried using Rotrics Dex Arm
I tried using GrabCut of OpenCV
I tried using Thonny (Python / IDE)
I tried reinforcement learning using PyBrain
I tried deep learning using Theano
Somehow I tried using jupyter notebook
[Kaggle] I tried undersampling using imbalanced-learn
I tried using the checkio API
[Python] I tried using YOLO v3
I tried asynchronous processing using asyncio
I tried using Amazon SQS with django-celery
I tried using Azure Speech to Text.
I tried using Twitter api and Line api
I tried playing a ○ ✕ game using TensorFlow
I tried using YOUTUBE Data API V3
I tried using Selenium with Headless chrome
I tried drawing a line using turtle
[Kaggle] I tried ensemble learning using LightGBM
I tried using PyEZ and JSNAPy. Part 2: I tried using PyEZ
I tried using Bayesian Optimization in Python
I tried to classify text using TensorFlow
I tried using Selective search as R-CNN
I tried using UnityCloudBuild API from Python
I tried using Headless Chrome from Selenium
I tried using pipenv, so a memo