[PYTHON] Implementation of Scale-space for SIFT

Goal of this article

You will be able to understand and implement the Scale-space that supports SIFT (Scale Invariant Feature Transform), which describes the features of images that are resistant to scale changes and rotation.

Related knowledge

Environment to use

Software name version
Python 3.4 or 3.5
Pillow 3.1.0
Numpy 1.10
Scipy 0.16.1
Matplotlib 1.5.0

Annotation

This article was written as a material for the 41st meeting of Morning Project Samurai.

Please

This is a draft version. If you find any mistakes, we would appreciate it if you could contact us.

Scale-space

Definition

motivation

Consider the detection of a particular object from any given image. For example, consider detecting a person from a scene (image) of a video of an urban road. At this time, there is no need for a scale image that shows the details of the pattern of the clothes that a person is wearing. It is better to have an image on a scale that allows you to understand the shape of a person but not what the clothes you are wearing. Information that is too detailed is rather noise.

As mentioned above, there is an image scale suitable for detecting the object of interest. However, in reality, a given image is not always at the proper scale. Therefore, a Scale-space is constructed by generating a plurality of images having different scales from a given image, and an image of an appropriate scale is found from the scale-spaces to detect an object.

Scale and scale up

In Scale-space, scale-up is the removal of detailed information from the original image using a technique called Gaussian convolution. The degree of scale-up is determined by the Gaussian convolution parameter $ \ sigma $. The larger the scale $ \ sigma $, the more information is removed from the original image.

For example, in the figure below, the original image on the far left can clearly see the details of the face and even the fur hair of the hat. In the image on the far right, with a scale of $ \ sigma = 3.2 $, you can see that there is a vague woman and the eyes are clear, but nothing more.

2D Gaussian convolution examples

How to configure

  1. Prepare the original image (grayscale).
  2. Set the scale $ \ sigma = k ^ {n} \ sigma_ {0} $ and set the initial value of $ n $ to $ 0 $.
  3. Scale up the original image using Gaussian convolution.
  4. Repeat step 3 while increasing $ n $ to $ 1, 2, 3, ... $ until $ k ^ {n} = 2.0 $. $ k $ is determined by how many images you want to make before $ k ^ {n} = 2.0 $.
  5. Downsample the image with scale $ \ sigma = 2.0 \ sigma_ {0} $ to half the size vertically and horizontally.
  6. Using the downsampled image as the original image, set $ n = 1 $ and return to step 3.

The detailed construction method and its program will be described after learning the Gaussian convolution, Fourier transform, sampling theorem, and so on.

Gaussian convolution

1D Gaussian convolution

The one-dimensional Gaussian convolution is expressed by the following equation.

F(x, \sigma)=\int_{-\infty}^{\infty} f(u) \frac{1}{\sqrt{2 \pi \sigma^{2}}}e^{-\frac{(u-x)^2}{2\sigma^{2}}} du

$ f $ is the original signal. $ \ frac {1} {\ sqrt {2 \ pi \ sigma ^ {2}}} e ^ {-\ frac {(ux) ^ 2} {2 \ sigma ^ {2}}} $ is called Gaussian kernel .. $ \ sigma $ is a parameter that determines the shape of the Gaussian kernel. The Gaussian kernel is shaped like a bell with a maximum value at the center $ x $, and its height and spread are determined by $ \ sigma $.

Gaussian kernel Let's try drawing a Gaussian kernel using numpy, scipy, and matplotlib.

import numpy as np
from scipy.signal import gaussian
from matplotlib import pyplot as plt


if __name__ == '__main__':
    npoints = 101
    sigmas = np.array([6.0, 12.0])
    for i, sigma in enumerate(sigmas):
        y = gaussian(npoints, sigma) / (np.sqrt(2.0 * np.pi) * sigma)
        plt.subplot(len(sigmas), 1, i + 1)
        plt.title('sigma = %s' % sigma)
        plt.ylim(ymax=0.08)
        plt.plot(np.arange(npoints/2 - npoints, npoints/2, dtype=np.int), y)
    plt.tight_layout()
    plt.show()

If you run the above program, you will get the result below.

Gaussian kernel examples

[Postscript of the nature of Gaussian kernel]

Interpretation and application example

The Gaussian convolution formula is "the original signal $ f $ multiplied by the weight of the center $ x $ spread $ \ sigma $ (Gaussian kernel) and the sum (weighted average) of $ F (x, \ sigma) $. It can be interpreted as "represented by". From this, $ F $ has the information of the entire $ f $ while strongly reflecting the information in the vicinity of $ u = x $ of the original signal $ f (u) $ with the strength determined by $ \ sigma $. It can be interpreted as a "new signal consisting of $ F (x, \ sigma) ". This means that " F $ is a signal in which the original signal $ f $ is smoothed (blurred, denoised) with an intensity of $ \ sigma ". In the context of Scale-space, it can be said that " F $ is a signal obtained by scaling $ f $ to scale $ \ sigma $ (without detailed information)".

As we saw in the previous section, the Gaussian kernel becomes flatter as $ \ sigma $ increases. This means that "the larger $ \ sigma $, the lighter the color of the information in the vicinity of $ u = x $ of the original signal $ f $ contained in $ F (x, \ sigma) $". As sigma $ gets bigger and bigger, $ F $ will eventually get the same value for all $ x $. In the context of Scale-space, "the larger the scale $ \ sigma $, the more macroscopic the signal is viewed (detailed information is lost), and eventually all the information contained in the signal is identified." Can be interpreted as.

Let's visualize the story so far with a program.

import numpy as np
from scipy.ndimage.filters import gaussian_filter1d
from matplotlib import pyplot as plt


if __name__ == '__main__':
    x = np.arange(0, 100)
    f = np.sin(0.5 * x) + np.random.normal(0, 6.0, x.shape)

    sigmas = [1.6, 3.2, 6.4]

    plt.subplot(len(sigmas) + 1, 1, 1)
    plt.ylim(ymin=np.min(f), ymax=np.max(f))
    plt.title('Original')
    plt.plot(f)

    for i, sigma in enumerate(sigmas):
        plt.subplot(len(sigmas) + 1, 1, 2 + i)
        plt.title('Sigma = %s' % sigma)
        plt.plot(gaussian_filter1d(f, sigma))
    plt.tight_layout()
    plt.show()

The following figure is obtained by executing the above program.

Gaussian convolution examples

As $ \ sigma $ grows (scales up), you can see that the detailed information contained in the signal $ f $ is being lost from $ F $.

2D Gaussian convolution

The image is a two-dimensional signal. Therefore, in order to scale up an image using Gaussian convolution, Gaussian convolution for a two-dimensional signal is required. Hereafter, when we simply write Gaussian convolution, we will refer to the two-dimensional Gaussian convolution.

The Gaussian convolution for a two-dimensional signal is expressed by the following equation.

F(x, y, \sigma) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(u, v) \frac{1}{2\sigma^{2}} e^{-\frac{(u - x)^2 + (v-y)^{2}}{\sigma^{2}}} dudv

$ \ frac {1} {2 \ sigma ^ {2}} e ^ {-\ frac {(u-x) ^ 2 + (vy) ^ {2}} {\ sigma ^ {2}}} $ is Gaussian It is a kernel. This Gaussian kernel is a three-dimensional bell type, with $ (x, y) $ taking the maximum value and $ \ sigma $ representing its spread.

Gaussian kernel Let's draw the Gaussian kernel programmatically.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


def gaussian_kernel_2d(x, y, sigma):
    return np.exp(-(np.power(x/sigma, 2) + np.power(y/sigma, 2)))/(2 * np.power(sigma, 2))


if __name__ == '__main__':
    xrange = np.arange(-10.0, 10.5, 0.5)
    yrange = np.arange(-10.0, 10.5, 0.5)
    kernel_values = np.zeros(shape=(len(yrange), len(xrange)))

    sigmas = [1.6, 2.4]

    fig = plt.figure()
    X, Y = np.meshgrid(xrange, yrange)

    for i, sigma in enumerate(sigmas):
        ax = fig.add_subplot(len(sigmas), 1, 1 + i,  projection='3d')
        plt.title('Sigma = %s' % sigma)
        ax.set_zlim(zmax=0.2)
        ax.plot_wireframe(X, Y, gaussian_kernel_2d(X, Y, sigma))
    plt.show()

The following figure can be obtained by executing the above program. It is possible to change the viewpoint with the mouse.

2D Gaussian kernel examples

Discretized Gaussian convolution

Gaussian filter Computers can only handle discrete values, and their range is finite. The Gaussian kernel is a continuous function with a domain of $ \ pm \ infinty $. Therefore, it is necessary to approximate the Gaussian kernel with a finite number of numerical strings and discretize it so that it can be handled by a computer.

Consider a grid of $ s $ rows and $ t $ columns. Hereafter, this grid is called the filter of $ s \ times t $. This is represented in the program as a two-dimensional array as shown below.

filter = np.zeros(shape=(s, t))

If you can set all the elements of the variable filter to represent the Gaussian kernel when $ u = 0 $, you can create a Gaussian filter $ g $ which is a discretized Gaussian kernel with $ u = 0 $. Here, the $ k $ row $ l $ column element of $ g $ $ g (k, l) $ = $ \ frac {1} {\ alpha} e ^ {-\ frac {l ^ {2} + k Set as ^ {2}} {\ sigma ^ {2}}} $. Here, $ \ alpha $ is a constant that normalizes so that when all the elements of $ g $ are added, the sum is $ 1 $.

Gaussian convolution with Gaussian filter

The following equation is a Gaussian convolution using a Gaussian filter. This is called a discretized Gaussian convolution. Hereafter, when we simply refer to Gaussian convolution, we will refer to this discretized Gaussian convolution.

F(i, j) = \sum_{k=0}^{s}\sum_{l=0}^{t} f(i+k-[\frac{s}{2}], j+l-[\frac{t}{2}]) g(k, l)

If you create a program obediently according to the above formula, it will be as follows.

for k in range(s):
    for l in range(t):
        F[i, j] = f[i + k - int(s/2), j + l - int(t/2)] * g(k, l)

In this case, $ s \ times t $ operations are required to obtain one pixel of the output image $ F $, and $ m \ times n \ times s \ times t $ to obtain all the pixels of the output image. Two operations are required. A convolution with a $ 320 \ times 240 $ image and a $ 5 \ times 5 $ Gaussian filter requires a total of $ 1,920,000 $ operations. Whether or not this number of calculations is critical depends on the machine specs and applications, but it is critical on my MacBook Air (13-inch, Early 2015).

Speeding up

Gaussian convolution has the property that the output image $ F $ of Gaussian convolution using the Gaussian filter of $ s \ times t $ is represented by the image $ F_ {1} $ output in the next two steps. There is.

  1. Convolution the 1D Gaussian filter of $ 1 \ times t $ with each line of the original image $ f $ and output the image $ F_ {0} $.
  2. Convolution the 1D Gaussian filter of $ s \ times 1 $ with each column of $ F_ {0} $ and output the image $ F_ {1} $.

This is proved as follows.

\begin{align}
F(i, j) &=\sum_{k=0}^{s}\sum_{l=0}^{t} f(i+k-[\frac{s}{2}], j+l-[\frac{t}{2}]) g(k, l)\\

&= \sum_{k=0}^{s}\sum_{l=0}^{t} f(i+k-[\frac{s}{2}], j+l-[\frac{t}{2}]) \frac{1}{\alpha}e^{-\frac{l^{2} + k^{2}}{\sigma^{2}}}\\

&= \sum_{k=0}^{s}\sum_{l=0}^{t} f(i+k-[\frac{s}{2}], j+l-[\frac{t}{2}]) \frac{1}{\alpha}e^{-\frac{l^{2}}{\sigma^{2}}}e^{-\frac{k^{2}}{\sigma^{2}}}\\

&=\frac{\alpha_{0}\alpha_{1}}{\alpha} \sum_{k=0}^{s}\frac{1}{\alpha_{1}}e^{\frac{-k^{2}}{\sigma^{2}}}\sum_{l=0}^{t} f(i+k-[\frac{s}{2}], j+l-[\frac{t}{2}]) \frac{1}{\alpha_{0}}e^{-\frac{l^{2}}{\sigma^{2}}}\\

&=\frac{\alpha_{0}\alpha_{1}}{\alpha} \sum_{k=0}^{s}\frac{1}{\alpha_{1}}e^{\frac{-k^{2}}{\sigma^{2}}}\sum_{l=0}^{t} f(i+k-[\frac{s}{2}], j+l-[\frac{t}{2}]) g_{0}(l)\\


&=\frac{\alpha_{0}\alpha_{1}}{\alpha} \sum_{k=0}^{s}\frac{1}{\alpha_{1}}e^{-\frac{k^{2}}{\sigma^{2}}} F_{0}(i, j)\\


&=\frac{\alpha_{0}\alpha_{1}}{\alpha} \sum_{k=0}^{s} F_{0}(i, j) g_{1}(k)\\


&=\frac{\alpha_{0}\alpha_{1}}{\alpha} F_{1}(i, j)
\end{align}

By doing so, the number of operations is $ s \ times m + t \ times n $. For a $ 320 \ times 240 $ image and a $ 5 \ times 5 $ Gaussian filter convolution, the number of operations is $ 768,000 $. By speeding up, the number of operations can be reduced by $ 60 % $.

Apply to image

Let's actually apply Gaussian convolution to the image. This time, Pillow, numpy, scipy are used. The program below generates an image with scale $ \ sigma = 1.6 $ and an image with scale $ \ sigma = 3.2 $.

from PIL import Image
import numpy as np
from scipy.ndimage.filters import gaussian_filter
from matplotlib import pyplot as plt

if __name__ == '__main__':
    orig_image = Image.open('lena.jpg').convert('L')
    orig_image = np.array(orig_image, dtype=np.uint8)

    sigmas = [1.6, 3.2]

    plt.subplot(1, len(sigmas) + 1, 1)
    plt.title('Orig')
    plt.imshow(orig_image, cmap='Greys_r')
    for i, sigma in enumerate(sigmas):
        plt.subplot(1, len(sigmas) + 1, 2 + i)
        plt.title('Sigma=%s' % sigma)
        plt.imshow(gaussian_filter(orig_image, sigma), cmap='Greys_r')
    plt.tight_layout()
    plt.savefig('gausian_convolution_2d_examples.png')
    plt.show()

The following figure is obtained by executing the above program.

2D Gaussian convolution examples

OpenCV is a well-known library for working with images. If you have it installed, you can use it. You can also try implementing the Gaussian convolution on your own.

Up to this point, you can generate an image of any scale $ \ sigma $ with respect to the original image.

Building Scale-space 1st octave

Finally, we will build Scale-space. The 1st octave in the chapter title will be described later.

Construction flow

  1. Determine the base scale $ \ sigma_ {0} $.
  2. Generate an image with scale $ k ^ {n} \ sigma $ for $ n = 0, ..., s $. $ s $ is the number of divisions of scale-space 1st octave. It is represented by $ k = 2 ^ {1 / s} $.

Program to build Scale-space 1st octave

from PIL import Image
import numpy as np
from scipy.ndimage.filters import gaussian_filter
from matplotlib import pyplot as plt


if __name__ == '__main__':
    orig_image = Image.open('lena.jpg').convert('L')
    orig_image = np.array(orig_image, dtype=np.uint8)

    sigma = 1.6
    s = 3
    k = np.power(2, 1/s)
    scale_space = []

    for n in range(s + 1):
        scale_space.append(gaussian_filter(orig_image, np.power(k, n) * sigma))

    for n, img in enumerate(scale_space):
        plt.subplot(1, len(scale_space), 1 + n)
        plt.title('Sigma=%s' % np.round(np.power(k, n) * sigma, 2))
        plt.imshow(img, cmap='Greys_r')
    plt.tight_layout()
    plt.show()

When the above program is executed, four images with scales from $ 1.6 $ to $ 3.2 $ are stored in the variable scale_space, and the figure below is displayed.

2D Gaussian convolution examples

The space consisting of the images from $ k \ sigma $ to $ 2k \ sigma $ is called 1st octave.

Fourier Transform and Sampling Theorem

In creating the 2nd octave and later, it is necessary to mention the Fourier transform and the sampling theorem. However, this time, I will not go deep and just let it flow smoothly.

Fourier transform

The Fourier transform transforms an image from the pixel domain to the frequency domain.

One-dimensional Fourier transform formula

F(j\omega) = \int_{-\infty}^{\infty} f(x) e^{-j\omega x} dx

Application example of 1D Fourier transform

Let's perform Fourier transform by FFT (Fast Fourier Transform) using numpy, scipy, and matplotlib. In the program below, the sampling rate is 20 and the angular frequency of the original signal is $ 2 \ pi $.

import numpy as np
from scipy.fftpack import fft
from matplotlib import pyplot as plt


if __name__ == '__main__':
    sampling_rate = 20
    sampling_interval = 1.0/sampling_rate
    x = np.arange(0, 1, sampling_interval)

    omega0 = 1.0
    omega = 2.0 * np.pi * omega0
    f = np.sin(omega * x)

    F = fft(f)

    plt.plot(np.arange(-len(f)/2, len(f)/2), np.abs(np.concatenate((F[len(f)/2:], F[:len(f)/2]))), 'bo-')
    plt.show()

When the above program is executed, the following figure is obtained.

2D Gaussian convolution examples

From this figure, it can be seen that the original signal $ f $ contains a signal with a frequency of $ 1 $. It is a good idea to Fourier transform the superposition of several periodic functions.

[I will write in a little more detail later]

One-dimensional signal sampling theorem

The sampling theorem states, "When the maximum period of the signal contained in the original signal $ f $ is $ W $, if the sampling period $ T $ of the original signal is $ T \ leq \ frac {1} {W} $ , The original signal $ f $ is completely recoverable from the sampled signal $ f_ {d} $. "

[Certificate will be added at a later date]

Experiment with one-dimensional signal sampling theorem

Let the sampling period be $ 1/20 $. Then, according to the sampling theorem, if the signal has a maximum frequency of $ 10 $, all the frequency information should be successfully extracted by Fourier transform.

Let's check with the program below.

import numpy as np
from scipy.fftpack import fft
from matplotlib import pyplot as plt

if __name__ == '__main__':
    sampling_rate = 20
    sampling_interval = 1.0/sampling_rate
    x = np.arange(0, 1, sampling_interval)

    omega0s = [9, 10, 11, 30]

    for i, omega0 in enumerate(omega0s):
        omega = 2.0 * np.pi * omega0
        f = np.sin(omega * x)

        F = fft(f)

        plt.subplot(len(omega0s),  1,  1 + i)
        plt.plot(np.arange(-len(f)/2, len(f)/2), np.abs(np.concatenate((F[len(f)/2:], F[:len(f)/2]))), 'bo-')
        plt.title('Max frequency = %s' % omega0)
        plt.xlabel('Frequency')
        plt.ylabel('Amplitude spectrum')
    plt.tight_layout()
    plt.show()

When executed, the following figure is obtained. From this result, it can be seen that the frequency information can be successfully extracted with the signal having the maximum frequency of $ 9 $ (details will be added later).

Sampling theorem examples

2D Fourier transform

[Postscript]

Build after 2nd octave

Downsampling

Overview

[Postscript]

Proof of downsampling validity

[Postscript]

2nd octave build program

[Postscript]

References

Recommended Posts

Implementation of Scale-space for SIFT
An implementation of ArcFace for TensorFlow
Implementation of Deep Learning model for image recognition
Implementation of Fibonacci sequence
Implementation example of LINE BOT server for actual operation
Quantum computer implementation of quantum walk 2
Derivation and implementation of update equations for non-negative tensor factorization
Implementation of TF-IDF using gensim
Explanation and implementation of SocialFoceModel
Implementation of game theory-Prisoner's dilemma-
Implementation of independent component analysis
Overview of Docker (for beginners)
Quantum computer implementation of quantum walk 3
Python implementation of particle filters
Implementation of quicksort in Python
Implementation of ML-EM method, cross-section reconstruction algorithm for CT scan
Implementation example of hostile generation network (GAN) by keras [For beginners]
Implementation of Bulk Update with mongo-go-driver
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Explanation and implementation of ESIM algorithm
Introduction and implementation of activation function
Qiskit: Implementation of quantum Boltzmann machine
Python implementation of self-organizing particle filters
Summary of basic implementation by PyTorch
Implementation of a simple particle filter
Implementation of login function in Django
Qiskit: Implementation of Quantum Hypergraph States
Quantum computer implementation of 3-state quantum walk
Rewrite piecewise of NumPy for CuPy
Implementation of life game in Python
Explanation and implementation of simple perceptron
[Must-see for beginners] Basics of Linux
Survey for practical use of BlockChain
Implementation of desktop notifications using Python
implementation of c / c ++> RingBuffer (N margins)
Python implementation of non-recursive Segment Tree
4th night of loop with for
Implementation of Light CNN (Python Keras)
Qiskit: Implementation of QAOA without Qiskit Aqua
Implementation of original sorting in Python
Implementation of Dijkstra's algorithm with python
Introductory table of contents for python3
Record of Python introduction for newcomers
[For competition professionals] Summary of doubling
Implementation of the treatise of "Parameter estimation method for human flow simulation" (unofficial)