I think that sometimes you can be lifted by being caught in a crayfish, but I would like to analyze how violent the movement was at that time! I don't think there are any cases.

Source: ASIAN KUNG-FU GENERATION "To Your Town" ([Solfa Remake Memorial](http://ro69.jp/news/ detail / 13427)))

Optical Flow expresses the movement between images (?). Optical Flow describes how each point moves between two images. By calculating this, it is possible to analyze the movement of feature points on the image as shown in the figure above.

In this article, I would like to introduce the theoretical background for calculating the Optical Flow and the implementation using Python / OpenCV.

Positioning of Optical Flow

There are various purposes and methods for achieving this in the analysis of motion between images. Here, I will first explain how Optical Flow is positioned in it.

Tracking
Purpose: I want to track a specific object (person or object) in an image
Method: There are two main types, one that is done in real time and one that is not. *-> Real-time type: The target object is represented by various methods such as template matching and Mean Shift, and tracking is performed between image frames. *-> Batch type: After all the images are available, the movement is calculated back by dynamic programming etc.
Flow estimation
Purpose: I want to detect what has moved in the image (unlike tracking, which has a fixed observation target).
Method: It can be roughly divided into a sparse type that analyzes only the characteristic points in the image and a dense type that analyzes the movement of the entire pixel. *-> sparse type: Lucas–Kanade method, etc. *-> Dense type: Horn–Schunck method, Gunnar Farneback method, etc.

Actually, the barrier of classification is ambiguous, such as the flow estimation method being used for tracking, and it may be divided according to the method used, so it seems a little chaotic, but it is roughly divided in this way.

And, as the name suggests, Optical Flow introduced this time is a flow estimation.

Theory

Then, I will explain the theoretical contents of how to calculate Optical Flow.

It's easy to understand if you imagine a flip book, but in the end, a video is a continuous image.

Therefore, if you can analyze the movement between two images, it seems that you can connect them and clarify the movement of the entire video.

So how can you estimate the movement between two images when you have them?

The fact that there is movement means that one point on the image has moved to another. Conversely, if something other than the position (color, etc.) changes, there is no way to track it anymore, so the color (often converted to grayscale during processing, so "brightness") should be the same between the two images. , And make the assumption.

The formula will come out from here, but please be assured that it will proceed one by one.

Let $ I (x, y, t) $ be the brightness of the points $ x, y $ on the image $ I $ at a certain time $ t $. If the time advances by $ \ Delta t $ and the coordinates move $ \ Delta x, \ Delta y $ during that time, the brightness of the destination is $ I (x + \ Delta x, y + \ Delta y, t + \ Delta t) $. Since we made the assumption that these should not have changed, the following equation holds.

I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)

From this equation, it is now possible to finally derive $ \ frac {\ Delta x} {\ Delta t}, \ frac {\ Delta y} {\ Delta t} $, which is the amount of change in coordinates per unit time. It will be the goal of the calculation performed from.

First, use the Taylor expansion to approximate the right-hand side. I feel that Taylor expansion uses some extremely difficult technique, but it is simply transformed to a value close to the true value. Please refer to here for a brief explanation.

I(x + \Delta x, y + \Delta y, t + \Delta t) = I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t + ...

It will continue for a long time after this, but as it continues, it will become smaller and smaller, so I will take the plunge and cut it off to make only the first development.

I(x + \Delta x, y + \Delta y, t + \Delta t) = I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta ｔ

What happens then is as follows.

I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t) = I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta ｔ

This will make $ \ frac {\ partial I} {\ partial x} \ Delta x + \ frac {\ partial I} {\ partial y} \ Delta y + \ frac {\ partial I} {\ partial t} \ Delta t If it is not = 0 $, the equations will not hold, so we will call them 0. Dividing this formula by $ \ Delta t $, that is, the amount of change over time, gives:

\frac{\partial I}{\partial x} \frac{\Delta x}{\Delta t} + \frac{\partial I}{\partial y} \frac{\Delta y}{{\Delta t}} + \frac{\partial I}{\partial t} = 0

For the sake of simplicity, the notation $ \ frac {\ partial I} {\ partial x} = I_x $ is as follows.

I_x \frac{\Delta x}{\Delta t} + I_y \frac{\Delta y}{{\Delta t}} + I_t = 0

Finally, $ \ frac {\ Delta x} {\ Delta t}, \ frac {\ Delta y} {{\ Delta t}} $ came out. Summarizing the movement amount of each of $ x and y $ per unit time in the form of vector $ \ bf {v} $, the following equation can be finally solved.

I_x {\bf v_x} + I_y {\bf v_y} + I_t = 0 \\\\ \nabla I^T {\bf v} = - I_t

However, from the conclusion, this formula cannot be solved as it is. I've been trying my best to follow the mathematical formulas so far, but I don't know! However, this is unavoidable because there is only one equation while there are two unknowns (two-dimensional) such as $ {\ bf v_x} $ and $ {\ bf v_y} $. This problem we face when seeking Optical Flow is called the Aperture Problem (why we say that is a lengthy explanation, but after all we need other constraints, nothing more than that. I haven't said it, so I'll omit it here).

Now, to solve this situation, we need to increase at least one constraint equation. This is the main subject, and the main theme of various methods for detecting motion is how to apply this "constraint", and various ideas have been proposed.

Here, I would like to introduce two methods, the Lucas–Kanade method, which is a representative of the sparse type, and the Horn–Schunck method, which is a dense type method.

Lucas–Kanade method

The Lucas–Kanade method assumes spatial consistency and constrains it. The point is to assume that the surrounding points should behave in the same way.

Assuming that the surrounding points are $ q_1 ... q_n $ and they all behave in the same way, we can inflate the constraint equation as follows:

I_x(q_1) {\bf v_x} + I_y(q_1) {\bf v_y} = - I_t(q_1) \\\\ I_x(q_2) {\bf v_x} + I_y(q_2) {\bf v_y} = - I_t(q_2) \\\\ ... \\\\ I_x(q_n) {\bf v_x} + I_y(q_n) {\bf v_y} = - I_t(q_n) \\\\

Here, each term is summarized in a matrix as follows.

A = \left[ \begin{array}{cc} I_x(q_1) & I_y(q_1) \\\\ I_x(q_2) & I_y(q_2) \\\\ \vdots & \vdots \\\\ I_x(q_n) & I_y(q_n) \end{array} \right] \\ b = \left[ \begin{array}{c} - I_t(q_1) \\\\ - I_t(q_2) \\\\ \vdots \\\\ - I_t(q_1) \end{array} \right]

Then you can simply write:

A{\bf v} = b

All you have to do is solve this system of equations. This can be solved using the least squares method(In short||b - A{\bf v}||^2Use differentiation to find the point that minimizes the squared error that can be expressed by)Finally, it can be expressed as follows.

{\bf v} = (A^TA)^{-1} A^Tb

To solve this equation, it is premised that there is an inverse matrix in $ A ^ TA $ (reversible) as shown in the equation, but Harris corner detection and Shi, which are typical methods for feature point detection, -It holds true in that it can be detected by Tomasi detection.

Horn–Schunck method

The Horn–Schunck method constrains the "smoothness" of movement. The point is that the one that travels the shortest distance is the better $ {\ bf v} $.

Let the cost (energy) of movement be $ E ({\ bf v}) $ and define it as follows. This is a cost, so the smaller it is, the better.

E({\bf v}) = E_{data}({\bf v}) + \alpha E_{smooth}({\bf v})

Here, $ E_ {data} ({\ bf v}) $ represents the first equation at hand. As mentioned above, $ I_x {\ bf v_x} + I_y {\ bf v_y} + I_t $ should be 0, so the closer it is to 0, the better $ {\ bf v} $. In the following, the integral is taken to add up the entire image.

E_{data}({\bf v}) = \int\int (I_x {\bf v_x} + I_y {\bf v_y} + I_t) ^2 dxdy

And $ E_ {smooth} ({\ bf v}) $ is the "smoothness" constraint.

E_{smooth}({\bf v}) = \int\int || \nabla {\bf v_x} || ^2 + || \nabla {\bf v_y} || ^2 dxdy

This means that the smaller the amount of change in $ {\ bf v} $, the better, that is, the better the one that "reaches the point where the brightness does not change with a small amount of change". $ \ Alpha $ is a factor in how much weight is placed on this smoothness constraint.

Solving this equation as a minimization problem will give you the desired $ {\ bf v} $. I will omit the subsequent development for solving, but this is the idea of the Horn–Schunck method.

Practical edition

Now, I would like to actually implement Optical Flow detection using OpenCV. We will use the Lucas–Kanade and Gunnar Farneback methods for which implementations are provided. The Gunnar Farneback method is a commonly used method of the dense type (constraints use neighboring points like the Lucas–Kanade method ...).

The sample actually implemented is placed below, but from here on, Domo-kun in NHK's CREATIVE LIBRARY as a material due to copyright issues. using.

icoxfog417/cv_tutorial/opticalflow

I'm using Python3.5, numpy1.10.4, and OpenCV3.1.0 (it's easier with Anaconda / Miniconda). However, only Matplotlib uses the previous 1.4.3. This is due to a bug in the execution of nbagg used for animation playback in the latest version 1.5.1 as of April 2016 (notebook backend figures close spontaneously # 6075. matplotlib / matplotlib / issues / 6075)).

As for the implementation, the basics are as in the OpenCV tutorial. First, from the Lucas–Kanade method.

You're running. .. .. You can see that the trajectory can be tracked. The Lucas–Kanade method detects feature points and estimates the optical flow from points around them. Therefore, the important processing is the following two points.

Feature detection: goodFeaturesToTrack
Optical Flow calculation: calcOpticalFlowPyrLK

Note that feature points may not be detected, of course (if not detected, None will be returned). Also, anything that cuts into the video from the middle was not detected. Keep in mind that what you track must be visible from the beginning (I've tried a lot, but I'm not sure why).

If you find a dense Optical Flow by the Gunnar Farneback method, you can capture the whole movement more like the following.

As an aside, the VideoCapture used to load the video can also pass the image file. I can do it. For example, if you specify ʻimg_% 02d.jpg, the files with serial numbers such as ʻimg_01.jpg, ʻimg_02.jpg` ... will be read automatically. You can easily make flipbooks with this, so it's fun to give it a try.

That is all for the explanation. Please try to capture the memorable movements!

References

Tutorial of the Society of Quantitative Biology What are Image Informatics Researchers doing?
Blockbuster material. It is easy to understand because it is carefully written for things other than Optical Flow.
Special Lecture on Image Media Engineering (5)
[Computer Vision Advent Calendar 2012] Derivation and features of the LucasKanade method
Introduction to OpenCV
Optical Flow I/ Guido Gerig CS 6320, Spring 2012
Wikipedia (actually quite easy to understand)
Optical_flow
Lucas–Kanade method
Horn–Schunck method
OpenCV-Python Tutorials/Optical Flow
Institute of Electronics, Information and Communication Engineers "Knowledge Forest" Chapter 4 Video Analysis
Taylor expansion commentary for beginners-Tohoku Institute of Technology
OPTFLOW_USE_INITIAL_FLOW option for cvCalculateOpticalFlowFarneback
Dense Optical Flow Expansion Based On Polynomial Basis Approximation

[PYTHON] Optical Flow, the dynamics of images captured by OpenCV