[Python] Region Covariance: Covariance matrix and computer vision

A paper "Region Covariance: A fast descriptor for detection and classification [^ 1]" that addresses issues in the field of computer vision such as object detection and texture classification using the covariance matrix as a feature. Summary and simple implementation of object detection in Python.

Introduction

Feature selection is very important in detection and classification problems. In fact, RGB values, brightness values, and their gradients are often used in the field of computer vision, but these features are not robust to the lighting conditions and may become high-dimensional features depending on the image size. There are problems such as. There is also a method that uses an image histogram, but if the number of bins in the histogram is $ b $ and the number of features is $ d $, the dimension of the histogram is $ b ^ d $, and the number of features The dimension increases exponentially. The following is an example of $ b = 4, d = 3 $ using RGB values as features. → →

Therefore, this paper proposes to use the variance-covariance matrix as the region descriptor of the image. For the number of features $ d $, the dimension of the variance-covariance matrix is $ d \ times d $, which has the advantage of being smaller than when the features are used as they are or when a histogram is used. In addition, this paper proposes a nearest neighbor search algorithm that uses a variance-covariance matrix.

Covariance as a Region Descriptor

Let $ I $ be a luminance value image of $ W \ times H $ or an RGB image of $ W \ times H \ times 3 $. $ W \ times H \ times d $ dimensional feature image $ F $ obtained from this image

F(x, y) = \phi(I, x, y)

Let. Here, the function $ \ phi $ is a mapping based on the luminance value, RGB value, and filter response such as gradient. Let the $ d $ dimensional feature vector in the rectangular region $ R \ subset F $ be $ z_1, \ dots, z_n $, and the variance-covariance matrix $ C_R $ representing the region $ R $.

C_R = \frac{1}{n-1}\sum_{k=1}^n(z_k - \mu)(z_k - \mu)^\top,\quad \mu=\frac1n\sum_{k=1}^n z_k

Calculate with.

As mentioned above, the dimension of the variance-covariance matrix is $ d \ times d $, and since it is a symmetric matrix, it has $ \ frac12d (d + 1) $ different values. This is lower than when the features are used as they are ($ n \ times d $ dimension) or when the histogram is used ($ b ^ d $ dimension).

In addition, since $ C_R $ does not have information on the order and number of feature points, it has an attribute that is invariant to rotation and scale depending on the features that make up the matrix. (For example, rotation invariance is lost when using gradient information in the $ x, y $ directions)

Distance Calculation on Covariance Matrices

The covariance matrix is not a Euclidean space but a set of definite matrix

\mathcal{P}(d) := \left\{X\in\mathbb{R}^{d\times d}\mid X=X^\top,X\succ O\right\}

It belongs to. This can be confirmed by the fact that $ -C_R $ cannot be a variance-covariance matrix for the variance-covariance matrix $ C_R $. Therefore, when performing the nearest neighbor search using the variance-covariance matrix, the distance on $ \ mathcal {P} (d) $ instead of the Euclidean distance (related to Lie groups and Riemannian manifolds)

\rho(C_1, C_2) = \sqrt{\sum_{i=1}^d \ln^2\lambda_i(C_1, C_2)} \tag{1}

Is used. Here, $ \ lambda_i (C_1, C_2) $ is the generalized eigenvalue of $ C_1, C_2 $, for the generalized eigenvector $ x_i \ neq0 $.

\lambda_i C_1 x_i - C_2 x_i = 0, \quad i=1, \dots, d

Meet.

Integral Images for Fast Covariance Computation

An integral image is used to calculate the variance-covariance matrix at high speed. What is an integrated image?

\text{Integral Image }(x', y') = \sum_{x<x' ,y<y'}I(x, y)

It is the sum of the pixel values existing in the upper left of the pixel of interest, as defined in. →

The $ (i, j) $ component of the covariance matrix is

\begin{align}
C_R(i, j) &= \frac{1}{n-1}\sum_{k=1}^n (z_k(i) - \mu(i))(z_k(j) - \mu(j))\\
&= \frac{1}{n-1}\left[\sum_{k=1}^n z_k(i)z_k(j) - \mu(i)\sum_{k=1}^n z_k(j) - \mu(j)\sum_{k=1}^n z_k(i) + n\mu(i)\mu(j)\right]\\
&= \frac{1}{n-1}\left[\sum_{k=1}^n z_k(i)z_k(j) - \frac1n\sum_{k=1}^n z_k(i)\sum_{k=1}^n z_k(j)\right]\\
\end{align}

It is calculated by, and it is found that the sum of the first and second orders of $ z_k $ is required. Therefore, $ W \ times H \ times d $ dimensional integrated image $ P $ and $ W \ times H \ times d \ times d $ dimensional integrated image $ Q $, respectively.

\begin{align}
P(x',y',i) &= \sum_{x<x', y<y'}F(x, y, i), \quad i=1, \dots, d \tag{2}\\
Q(x',y',i, j) &= \sum_{x<x', y<y'}F(x, y, i)F(x, y, j), \quad i, j=1, \dots, d \tag{3}
\end{align}

It is defined by. Also, the values of the integrated image at one point $ (x, y) $ are set respectively.

\begin{align}
p_{x, y} &= \left[P(x, y, 1), \dots, P(x, y, d)\right]^\top\\
Q_{x, y} &= 
\begin{pmatrix}
Q(x, y, 1, 1) & \cdots & Q(x, y, 1, d)\\
\vdots & \ddots & \vdots\\
Q(x, y, d, 1) & \cdots & Q(x, y, d, d)
\end{pmatrix}
\end{align}

Let.

Let the rectangular area of $ (x', y') $ in the upper left and $ (x'', y'') $ in the lower right be $ R (x', y'; x'', y'') $. Then, the variance-covariance matrix in this rectangular region is $ n = (x''-x') \ cdot (y''-y') $.

\begin{align}
C_{R(x', y'; x'', y'')} = \frac{1}{n-1}\left[Q_{x'', y''} + Q_{x', y'} - Q_{x'', y'} - Q_{x', y''}\\ 
-\frac1n(p_{x'', y''} + p_{x', y'} - p_{x'', y'} - p_{x', y''})(p_{x'', y''} + p_{x', y'} - p_{x'', y'} - p_{x', y''})^\top\right] \tag{4}
\end{align}

Can be calculated with.

Object Detection

In this paper, the image features are $ x, y $ coordinates of pixels, RGB values of images $ R (x, y), G (x, y), B (x, y) $, and brightness values of images $ I. Object detection is performed using nine absolute values of the first-order and second-order gradients of (x, y) $. The first-order and second-order gradients are calculated using the filters $ [-1, 0, 1], [-1, 2, -1] $.

F(x, y) = \left[x, y, R(x, y), G(x, y), B(x, y), \left|\frac{\partial I(x, y)}{\partial x}\right|, \left|\frac{\partial I(x, y)}{\partial y}\right|, \left|\frac{\partial^2 I(x, y)}{\partial x^2}\right|, \left|\frac{\partial^2 I(x, y)}{\partial y^2}\right|\right]^\top \tag{5}

Object detection procedure

Calculate the variance-covariance matrix from the input image by Eq. (4).
Perform a brute force search on the target image using 9 different window sizes and step sizes, and calculate the variance-covariance matrix for each window.
Detect windows with a small distance between the variance-covariance matrices calculated by Eq. (1).

In the paper, five variance-covariance matrices are created to represent the object, and five variance-covariance matrices are calculated again for 1000 windows with a small distance between the variance-covariance matrices. Therefore, a numerical experiment was performed with only one.

Numerical experiment

The part surrounded by pink in the left image was used as the input image, and object detection was performed on the two images on the right with different resolutions.

Source code and brief commentary

The source code is as follows, and the executed notebook is on Github.

The function calc_distance is used to calculate the distance in Eq. (1). The generalized eigenvalue uses the Hermitian eigenvalue calculation function scipy.linalg.eigh. did.
The function _extract_features is used to extract the features of Eq. (5). Gradient calculation is performed using cv2.filter2D.
Use the function _calc_integral_images to calculate the integral images of equations (2) and (3). The integrated image is calculated by cv2.integral.
The function _calc_covariance is used to calculate Eq. (4).

from typing import List, Tuple
from itertools import product

import cv2
import numpy as np
from scipy.linalg import eigh
from tqdm import tqdm


def calc_distance(x: np.ndarray, y: np.ndarray) -> float:
    w = eigh(x, y, eigvals_only=True)
    return np.linalg.norm(np.log(w) ** 2)


class RegionCovarianceDetector:
    """
    Object Detector Using Region Covariance

    Parameters
    ----------
    coord : bool, optional (default=True)
        Whether use coordinates as features or not.

    color : bool, optional (default=True)
        Whether use color channels as features or not.

    intensity : bool, optional (default=False)
        Whether use intensity as feature or not.

    kernels : a list of np.ndarray, optional (default=None)
        Filters applied to intensity image. If None, no filters are used.

    ratio : float, optional (default=1.15)
        Scaling factor between two consecutive scales of the search window size and step size.

    step : int, optional (default=3)
        The minimum step size.

    n_windows : int, optional (default=9)
        The number of scales of the windows.

    eps : float, optional (default=1e-16)
        Small number to keep covariance matrices in SPD.

    Attributes
    ----------
    object_shape_ : (int, int)
        The object's shape.

    object_covariance_ : np.ndarray, shape (n_features, n_features)
        Covariance matrix of the object.
    """

    def __init__(
            self,
            coord: bool = True,
            color: bool = True,
            intensity: bool = False,
            kernels: List[np.ndarray] = None,
            ratio: float = 1.15,
            step: int = 3,
            n_windows: int = 9,
            eps: float = 1e-16
    ):
        self.coord = coord
        self.color = color
        self.intensity = intensity
        self.kernels = kernels
        self.ratio = ratio
        self.step = step
        self.n_windows = n_windows
        self.eps = eps

        self.object_shape_ = None
        self.object_covariance_ = None

    def _extract_features(self, img: np.ndarray) -> List[np.ndarray]:
        """
        Extract image features.

        Parameters
        ----------
        img : np.ndarray, shape (h, w, c)
            uint8 RGB image

        Returns
        -------
        features : a list of np.ndarray
            Features such as intensity, its gradient and so on.
        """
        h, w, c = img.shape[:3]
        intensity = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)[:, :, 2] / 255.
        features = list()

        # use coordinates
        if self.coord:
            features.append(np.tile(np.arange(w, dtype=float), (h, 1)))
            features.append(np.tile(np.arange(h, dtype=float).reshape(-1, 1), (1, w)))

        # use color channels
        if self.color:
            for i in range(c):
                features.append(img[:, :, i].astype(float) / 255.)

        # use intensity
        if self.intensity:
            features.append(intensity)

        # use filtered images
        if self.kernels is not None:
            for kernel in self.kernels:
                features.append(np.abs(cv2.filter2D(intensity, cv2.CV_64F, kernel)))

        return features

    def _calc_integral_images(self, img: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        """
        Calculate integral images.

        Parameters
        ----------
        img : np.ndarray, shape (h, w, c)
            uint8 RGB image

        Returns
        -------
        P : np.ndarray, shape (h+1, w+1, n_features)
            First order integral images of features.

        Q : np.ndarray, shape (h+1, w+1, n_features, n_features)
            Second order integral images of features.
        """
        h, w = img.shape[:2]
        features = self._extract_features(img)
        length = len(features)

        # first order integral images
        P = cv2.integral(np.array(features).transpose((1, 2, 0)))

        # second order integral images
        Q = cv2.integral(
            np.array(list(map(lambda x: x[0] * x[1], product(features, features)))).transpose((1, 2, 0))
        )
        Q = Q.reshape(h + 1, w + 1, length, length)
        return P, Q

    def _calc_covariance(self, P: np.ndarray, Q: np.ndarray, pt1: Tuple[int, int], pt2: Tuple[int, int]) -> np.ndarray:
        """
        Calculate covariance matrix from integral images.

        Parameters
        ----------
        P : np.ndarray, shape (h+1, w+1, n_features)
            First order integral images of features.

        Q : np.ndarray, shape (h+1, w+1, n_features, n_features)
            Second order integral images of features.

        pt1 : (int, int)
            Left top coordinate.

        pt2 : (int, int)
            Right bottom coordinate.

        Returns
        -------
        covariance : np.ndarray, shape (n_features, n_features)
            Covariance matrix.
        """
        x1, y1 = pt1
        x2, y2 = pt2
        q = Q[y2, x2] + Q[y1, x1] - Q[y1, x2] - Q[y2, x1]
        p = P[y2, x2] + P[y1, x1] - P[y1, x2] - P[y2, x1]
        n = (y2 - y1) * (x2 - x1)
        covariance = (q - np.outer(p, p) / n) / (n - 1) + self.eps * np.identity(P.shape[2])
        return covariance

    def fit(self, img: np.ndarray):
        """
        Calculate the object covariance matrix.

        Parameters
        ----------
        img : np.ndarray, shape (h, w, c)
            uint8 RGB image

        Returns
        -------
        : Fitted detector.
        """
        h, w = img.shape[:2]
        P, Q = self._calc_integral_images(img)

        # normalize about coordinates
        if self.coord:
            for i, size in enumerate((w, h)):
                P[:, :, i] /= size
                Q[:, :, i] /= size
                Q[:, :, :, i] /= size

        # calculate covariance matrix
        self.object_covariance_ = self._calc_covariance(P, Q, (0, 0), (w, h))
        self.object_shape_ = (h, w)
        return self

    def predict(self, img: np.ndarray) -> Tuple[Tuple[int, int], Tuple[int, int], float]:
        """
        Compute object's position in the target image.

        Parameters
        ----------
        img : np.ndarray, shape (h, w, c)
            uint8 RGB image

        Returns
        -------
        pt1 : (int, int)
            Left top coordinate.

        pt2 : (int, int)
            Right bottom coordinate.

        score : float
            Dissimilarity of object and target covariance matrices.
        """
        tar_h, tar_w = img.shape[:2]
        obj_h, obj_w = self.object_shape_
        P, Q = self._calc_integral_images(img)

        # search window's shape and step size
        end = (self.n_windows + 1) // 2
        start = end - self.n_windows
        shapes = [(int(obj_h * self.ratio ** i), int(obj_w * self.ratio ** i)) for i in range(start, end)]
        steps = [int(self.step * self.ratio ** i) for i in range(self.n_windows)]

        distances = list()
        for shape, step in tqdm(zip(shapes, steps)):
            p_h, p_w = shape
            p_P, p_Q = P.copy(), Q.copy()

            # normalize about coordinates
            if self.coord:
                for i, size in enumerate((p_w, p_h)):
                    p_P[:, :, i] /= size
                    p_Q[:, :, i] /= size
                    p_Q[:, :, :, i] /= size

            distance = list()
            y1, y2 = 0, p_h
            while y2 <= tar_h:
                dist = list()
                x1, x2 = 0, p_w
                while x2 <= tar_w:
                    # calculate covariance matrix
                    p_cov = self._calc_covariance(p_P, p_Q, (x1, y1), (x2, y2))

                    # jump horizontally
                    x1 += step
                    x2 += step

                    # calculate dissimilarity of two covariance matrices
                    dist.append(calc_distance(self.object_covariance_, p_cov))

                # jump vertically
                y1 += step
                y2 += step
                distance.append(dist)
            distances.append(np.array(distance))

        # choose the most similar window
        min_indices = list(map(np.argmin, distances))
        min_index = int(np.argmin([dist.flatten()[i] for i, dist in zip(min_indices, distances)]))
        min_step = steps[min_index]
        min_shape = shapes[min_index]
        min_indice = min_indices[min_index]
        b_h, b_w = distances[min_index].shape

        pt1 = ((min_indice % b_w) * min_step, (min_indice // b_w) * min_step)
        pt2 = (pt1[0] + min_shape[1], pt1[1] + min_shape[0])
        score = distances[min_index].flatten()[min_indice]
        return pt1, pt2, score

The execution results and calculation times are 6.22 s and 19.7 s, respectively. You can make it a little faster by changing the values of n_windows and step.