I think that sometimes you can be lifted by being caught in a crayfish, but I would like to analyze how violent the movement was at that time! I don't think there are any cases.
Source: ASIAN KUNG-FU GENERATION "To Your Town" ([Solfa Remake Memorial](http://ro69.jp/news/ detail / 13427)))
Optical Flow expresses the movement between images (?). Optical Flow describes how each point moves between two images. By calculating this, it is possible to analyze the movement of feature points on the image as shown in the figure above.
In this article, I would like to introduce the theoretical background for calculating the Optical Flow and the implementation using Python / OpenCV.
There are various purposes and methods for achieving this in the analysis of motion between images. Here, I will first explain how Optical Flow is positioned in it.
Actually, the barrier of classification is ambiguous, such as the flow estimation method being used for tracking, and it may be divided according to the method used, so it seems a little chaotic, but it is roughly divided in this way.
And, as the name suggests, Optical Flow introduced this time is a flow estimation.
Then, I will explain the theoretical contents of how to calculate Optical Flow.
It's easy to understand if you imagine a flip book, but in the end, a video is a continuous image.
Therefore, if you can analyze the movement between two images, it seems that you can connect them and clarify the movement of the entire video.
So how can you estimate the movement between two images when you have them?
The fact that there is movement means that one point on the image has moved to another. Conversely, if something other than the position (color, etc.) changes, there is no way to track it anymore, so the color (often converted to grayscale during processing, so "brightness") should be the same between the two images. , And make the assumption.
The formula will come out from here, but please be assured that it will proceed one by one.
Let $ I (x, y, t) $ be the brightness of the points $ x, y $ on the image $ I $ at a certain time $ t $. If the time advances by $ \ Delta t $ and the coordinates move $ \ Delta x, \ Delta y $ during that time, the brightness of the destination is $ I (x + \ Delta x, y + \ Delta y, t + \ Delta t) $. Since we made the assumption that these should not have changed, the following equation holds.
From this equation, it is now possible to finally derive $ \ frac {\ Delta x} {\ Delta t}, \ frac {\ Delta y} {\ Delta t} $, which is the amount of change in coordinates per unit time. It will be the goal of the calculation performed from.
First, use the Taylor expansion to approximate the right-hand side. I feel that Taylor expansion uses some extremely difficult technique, but it is simply transformed to a value close to the true value. Please refer to here for a brief explanation.
It will continue for a long time after this, but as it continues, it will become smaller and smaller, so I will take the plunge and cut it off to make only the first development.
What happens then is as follows.
This will make $ \ frac {\ partial I} {\ partial x} \ Delta x + \ frac {\ partial I} {\ partial y} \ Delta y + \ frac {\ partial I} {\ partial t} \ Delta t If it is not = 0 $, the equations will not hold, so we will call them 0. Dividing this formula by $ \ Delta t $, that is, the amount of change over time, gives:
For the sake of simplicity, the notation $ \ frac {\ partial I} {\ partial x} = I_x $ is as follows.
Finally, $ \ frac {\ Delta x} {\ Delta t}, \ frac {\ Delta y} {{\ Delta t}} $ came out. Summarizing the movement amount of each of $ x and y $ per unit time in the form of vector $ \ bf {v} $, the following equation can be finally solved.
However, from the conclusion, this formula cannot be solved as it is. I've been trying my best to follow the mathematical formulas so far, but I don't know! However, this is unavoidable because there is only one equation while there are two unknowns (two-dimensional) such as $ {\ bf v_x} $ and $ {\ bf v_y} $. This problem we face when seeking Optical Flow is called the Aperture Problem (why we say that is a lengthy explanation, but after all we need other constraints, nothing more than that. I haven't said it, so I'll omit it here).
Now, to solve this situation, we need to increase at least one constraint equation. This is the main subject, and the main theme of various methods for detecting motion is how to apply this "constraint", and various ideas have been proposed.
Here, I would like to introduce two methods, the Lucas–Kanade method, which is a representative of the sparse type, and the Horn–Schunck method, which is a dense type method.
The Lucas–Kanade method assumes spatial consistency and constrains it. The point is to assume that the surrounding points should behave in the same way.
Assuming that the surrounding points are $ q_1 ... q_n $ and they all behave in the same way, we can inflate the constraint equation as follows:
Here, each term is summarized in a matrix as follows.
Then you can simply write:
All you have to do is solve this system of equations. This can be solved using the least squares method(In short
To solve this equation, it is premised that there is an inverse matrix in $ A ^ TA $ (reversible) as shown in the equation, but Harris corner detection and Shi, which are typical methods for feature point detection, -It holds true in that it can be detected by Tomasi detection.
The Horn–Schunck method constrains the "smoothness" of movement. The point is that the one that travels the shortest distance is the better $ {\ bf v} $.
Let the cost (energy) of movement be $ E ({\ bf v}) $ and define it as follows. This is a cost, so the smaller it is, the better.
Here, $ E_ {data} ({\ bf v}) $ represents the first equation at hand. As mentioned above, $ I_x {\ bf v_x} + I_y {\ bf v_y} + I_t $ should be 0, so the closer it is to 0, the better $ {\ bf v} $. In the following, the integral is taken to add up the entire image.
And $ E_ {smooth} ({\ bf v}) $ is the "smoothness" constraint.
This means that the smaller the amount of change in $ {\ bf v} $, the better, that is, the better the one that "reaches the point where the brightness does not change with a small amount of change". $ \ Alpha $ is a factor in how much weight is placed on this smoothness constraint.
Solving this equation as a minimization problem will give you the desired $ {\ bf v} $. I will omit the subsequent development for solving, but this is the idea of the Horn–Schunck method.
Now, I would like to actually implement Optical Flow detection using OpenCV. We will use the Lucas–Kanade and Gunnar Farneback methods for which implementations are provided. The Gunnar Farneback method is a commonly used method of the dense type (constraints use neighboring points like the Lucas–Kanade method ...).
The sample actually implemented is placed below, but from here on, Domo-kun in NHK's CREATIVE LIBRARY as a material due to copyright issues. using.
icoxfog417/cv_tutorial/opticalflow
I'm using Python3.5, numpy1.10.4, and OpenCV3.1.0 (it's easier with Anaconda / Miniconda). However, only Matplotlib uses the previous 1.4.3. This is due to a bug in the execution of nbagg used for animation playback in the latest version 1.5.1 as of April 2016 (notebook backend figures close spontaneously # 6075. matplotlib / matplotlib / issues / 6075)).
As for the implementation, the basics are as in the OpenCV tutorial. First, from the Lucas–Kanade method.
You're running. .. .. You can see that the trajectory can be tracked. The Lucas–Kanade method detects feature points and estimates the optical flow from points around them. Therefore, the important processing is the following two points.
Note that feature points may not be detected, of course (if not detected, None
will be returned). Also, anything that cuts into the video from the middle was not detected. Keep in mind that what you track must be visible from the beginning (I've tried a lot, but I'm not sure why).
If you find a dense Optical Flow by the Gunnar Farneback method, you can capture the whole movement more like the following.
As an aside, the VideoCapture
used to load the video can also pass the image file. I can do it. For example, if you specify ʻimg_% 02d.jpg, the files with serial numbers such as ʻimg_01.jpg
, ʻimg_02.jpg` ... will be read automatically. You can easily make flipbooks with this, so it's fun to give it a try.
That is all for the explanation. Please try to capture the memorable movements!