[PYTHON] I tried to move the 3D model by doing something like motion capture with just a laptop + webcam


This article If you use a posture estimation model that is reasonably light in processing, you can do motion capture with just a laptop and a webcam (built-in camera). I think I can do it, so I tried it. The flow is almost the same as the original article.

Things necessary


Python side

1. 1. Preparation of posture estimation model

Please clone the following https://github.com/ildoonet/tf-pose-estimation/tree/master By using this, the posture of the person in the 2D image can be estimated.

2. Restoration of 3D information

In order to do something like motion capture, 3D information is needed.

    1. The model of is only able to obtain two-dimensional information. Therefore, use the process in the develop branch of the repository to get the 3D information. (It looks like it was originally in master, but it's gone)

Please clone the following https://github.com/ildoonet/tf-pose-estimation/tree/devel Then move the devel / src / lifting folder to master

3. Preparation of WebSocket server

In this system, processing such as posture estimation of a person is performed on the Python side, and only the 3D model is displayed on the Unity side. The information communication part of Python and Unity will be implemented using WebSocket this time.

Execute the following command pip install git+https://github.com/Pithikos/python-websocket-server


import logging

import cv2
import json
import numpy as np
import common

from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh
from websocket_server import WebsocketServer
from lifting.prob_model import Prob3dPose

PORT = 5000
HOST = ''

# logger_setup
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter(' %(module)s -  %(asctime)s - %(levelname)s - %(message)s'))

def create_json(pose3d):
    global old_data

    data = {'body_parts': []}

    // 0 :Hip
    // 1 :RHip
    // 2 :RKnee
    // 3 :RFoot
    // 4 :LHip
    // 5 :LKnee
    // 6 :LFoot
    // 7 :Spine
    // 8 :Thorax
    // 9 :Neck/Nose
    // 10:Head
    // 11:LShoulder
    // 12:LElbow
    // 13:LWrist
    // 14:RShoulder
    // 15:RElbow
    // 16:RWrist

    for i in range(17):
        data['body_parts'].append({'id': i, 'x': pose3d[0][0][i], 'y': pose3d[0][2][i], 'z': pose3d[0][1][i]})

    old_data = data
    return data

def new_client(client, server):
    logger.info('NewClient {}:{} has left.'.format(client['address'][0], client['address'][1]))

def client_left(client, server):
    logger.info('Client {}:{} has left.'.format(client['address'][0], client['address'][1]))

def message_received(client, server, message):
    _, image = cam.read()

    humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=4.0)

    pose_2d_mpiis = []
    visibilities = []

    standard_w = 640
    standard_h = 480

        pose_2d_mpii, visibility = common.MPIIPart.from_coco(humans[0])
        pose_2d_mpiis.append([(int(x * standard_w + 0.5), int(y * standard_h + 0.5)) for x, y in pose_2d_mpii])
        pose_2d_mpiis = np.array(pose_2d_mpiis)
        visibilities = np.array(visibilities)
        transformed_pose2d, weights = poseLifting.transform_joints(pose_2d_mpiis, visibilities)
        pose_3d = poseLifting.compute_3d(transformed_pose2d, weights)
        server.send_message(client, json.dumps(create_json(pose_3d)))

    except :
        server.send_message(client, json.dumps(old_data))

if __name__ == '__main__':
    # main
    w, h = model_wh("432x368")
    e = TfPoseEstimator(get_graph_path("mobilenet_thin"), target_size=(432, 368), trt_bool=False)
    poseLifting = Prob3dPose('lifting/models/prob_model_params.mat')

    cam = cv2.VideoCapture(0)

    old_data = {}

    server = WebsocketServer(port=PORT, host=HOST)

Now the Python side is ready

Unity side

1. Preparation of 3D model

First of all, let's prepare the 3D model you want to move. This time, I used the "Unity-Chan!" Model from the Asset Store.

2. Installation of required libraries

Clone SAFullBodyIK and move it to the Assets folder. Also, clone and build https://github.com/sta/websocket-sharp so that WebSocket can be received. The following is easy to understand how to build https://qiita.com/oishihiroaki/items/bb2977c72052f5dd5bd9

2. Unity side code

I borrowed the code of Reference source of this article.


using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using UnityEngine;
using WebSocketSharp;
using WebSocketSharp.Net;

public class IKSetting : MonoBehaviour {
    private BodyParts bodyParts;
    private string receivedJson;
    private WebSocket ws;

    [SerializeField, Range(10, 120)]
    float FrameRate;
    public List<Transform> BoneList = new List<Transform>();
    GameObject FullbodyIK;
    Vector3[] points = new Vector3[17];
    Vector3[] NormalizeBone = new Vector3[12];
    float[] BoneDistance = new float[12];
    float Timer;
    int[,] joints = new int[,] { { 0, 1 }, { 1, 2 }, { 2, 3 }, { 0, 4 }, { 4, 5 }, { 5, 6 }, { 0, 7 }, { 7, 8 }, { 8, 9 }, { 9, 10 }, { 8, 11 }, { 11, 12 }, { 12, 13 }, { 8, 14 }, { 14, 15 }, { 15, 16 } };
    int[,] BoneJoint = new int[,] { { 0, 2 }, { 2, 3 }, { 0, 5 }, { 5, 6 }, { 0, 9 }, { 9, 10 }, { 9, 11 }, { 11, 12 }, { 12, 13 }, { 9, 14 }, { 14, 15 }, { 15, 16 } };
    int[,] NormalizeJoint = new int[,] { { 0, 1 }, { 1, 2 }, { 0, 3 }, { 3, 4 }, { 0, 5 }, { 5, 6 }, { 5, 7 }, { 7, 8 }, { 8, 9 }, { 5, 10 }, { 10, 11 }, { 11, 12 } };
    int NowFrame = 0;

    float[] x = new float[17];
    float[] y = new float[17];
    float[] z = new float[17];

    bool isReceived = false;

    // Use this for initialization
    void Start () {

        ws = new WebSocket("ws://localhost:5000/");
        ws.OnOpen += (sender, e) =>
            Debug.Log("WebSocket Open");
        ws.OnMessage += (sender, e) =>
            receivedJson = e.Data;
            Debug.Log("Data: " + e.Data);
            isReceived = true;
        ws.OnError += (sender, e) =>
            Debug.Log("WebSocket Error Message: " + e.Message);
        ws.OnClose += (sender, e) =>
            Debug.Log("WebSocket Close");


    // Update is called once per frame
    void Update () {

        Timer += Time.deltaTime;

        if (Timer > (1 / FrameRate))
            Timer = 0;
        if (!FullbodyIK)

    void OnDestroy()
        ws = null;

    void PointUpdate()
        if (NowFrame < 600)
            if (isReceived)
                bodyParts = JsonUtility.FromJson<BodyParts>(receivedJson);
                for (int i = 0; i < 17; i++)
                    x[i] = bodyParts.body_parts[i].x;
                    y[i] = bodyParts.body_parts[i].y;
                    z[i] = bodyParts.body_parts[i].z;

                isReceived = false;

            for (int i = 0; i < 17; i++)
                points[i] = new Vector3(x[i], y[i], -z[i]);

            for (int i = 0; i < 12; i++)
                NormalizeBone[i] = (points[BoneJoint[i, 1]] - points[BoneJoint[i, 0]]).normalized;

    void IKFind()
        FullbodyIK = GameObject.Find("FullBodyIK");
        if (FullbodyIK)
            for (int i = 0; i < Enum.GetNames(typeof(OpenPoseRef)).Length; i++)
                Transform obj = GameObject.Find(Enum.GetName(typeof(OpenPoseRef), i)).transform;
                if (obj)
            for (int i = 0; i < Enum.GetNames(typeof(NormalizeBoneRef)).Length; i++)
                BoneDistance[i] = Vector3.Distance(BoneList[NormalizeJoint[i, 0]].position, BoneList[NormalizeJoint[i, 1]].position);

    void IKSet()
        if (Math.Abs(points[0].x) < 1000 && Math.Abs(points[0].y) < 1000 && Math.Abs(points[0].z) < 1000)
            BoneList[0].position = points[0] * 0.001f + Vector3.up * 0.8f;
        for (int i = 0; i < 12; i++)
            BoneList[NormalizeJoint[i, 1]].position = Vector3.Lerp(
                BoneList[NormalizeJoint[i, 1]].position,
                BoneList[NormalizeJoint[i, 0]].position + BoneDistance[i] * NormalizeBone[i]
                , 0.05f
            DrawLine(BoneList[NormalizeJoint[i, 0]].position, BoneList[NormalizeJoint[i, 1]].position, Color.red);
        for (int i = 0; i < joints.Length / 2; i++)
            DrawLine(points[joints[i, 0]] * 0.001f + new Vector3(-1, 0.8f, 0), points[joints[i, 1]] * 0.001f + new Vector3(-1, 0.8f, 0), Color.blue);

    void DrawLine(Vector3 s, Vector3 e, Color c)
        Debug.DrawLine(s, e, c);

enum OpenPoseRef
    LeftKnee, LeftFoot,
    RightKnee, RightFoot,
    Neck, Head,
    RightArm, RightElbow, RightWrist,
    LeftArm, LeftElbow, LeftWrist

enum NormalizeBoneRef
    Hip2LeftKnee, LeftKnee2LeftFoot,
    Hip2RightKnee, RightKnee2RightFoot,
    Hip2Neck, Neck2Head,
    Neck2RightArm, RightArm2RightElbow, RightElbow2RightWrist,
    Neck2LeftArm, LeftArm2LeftElbow, LeftElbow2LeftWrist

public class BodyParts
    public Position[] body_parts;

public class Position
    public int id;
    public float x;
    public float y;
    public float z;

That's all for preparation.


After executing server.py, press the Play button on the Unity side.


It moved faster than expected, although there was some lag. If you look for a lighter model, you can find it, so there seems to be some prediction of improvement.

