A paper "Region Covariance: A fast descriptor for detection and classification [^ 1]" that addresses issues in the field of computer vision such as object detection and texture classification using the covariance matrix as a feature. Summary and simple implementation of object detection in Python.
Introduction
Feature selection is very important in detection and classification problems. In fact, RGB values, brightness values, and their gradients are often used in the field of computer vision, but these features are not robust to the lighting conditions and may become high-dimensional features depending on the image size. There are problems such as. There is also a method that uses an image histogram, but if the number of bins in the histogram is $ b $ and the number of features is $ d $, the dimension of the histogram is $ b ^ d $, and the number of features The dimension increases exponentially. The following is an example of $ b = 4, d = 3 $ using RGB values as features. → →
Therefore, this paper proposes to use the variance-covariance matrix as the region descriptor of the image. For the number of features $ d $, the dimension of the variance-covariance matrix is $ d \ times d $, which has the advantage of being smaller than when the features are used as they are or when a histogram is used. In addition, this paper proposes a nearest neighbor search algorithm that uses a variance-covariance matrix.
Covariance as a Region Descriptor
Let $ I $ be a luminance value image of $ W \ times H $ or an RGB image of $ W \ times H \ times 3 $. $ W \ times H \ times d $ dimensional feature image $ F $ obtained from this image
F(x, y) = \phi(I, x, y)
Let. Here, the function $ \ phi $ is a mapping based on the luminance value, RGB value, and filter response such as gradient. Let the $ d $ dimensional feature vector in the rectangular region $ R \ subset F $ be $ z_1, \ dots, z_n $, and the variance-covariance matrix $ C_R $ representing the region $ R $.
C_R = \frac{1}{n-1}\sum_{k=1}^n(z_k - \mu)(z_k - \mu)^\top,\quad \mu=\frac1n\sum_{k=1}^n z_k
Calculate with.
As mentioned above, the dimension of the variance-covariance matrix is $ d \ times d $, and since it is a symmetric matrix, it has $ \ frac12d (d + 1) $ different values. This is lower than when the features are used as they are ($ n \ times d $ dimension) or when the histogram is used ($ b ^ d $ dimension).
In addition, since $ C_R $ does not have information on the order and number of feature points, it has an attribute that is invariant to rotation and scale depending on the features that make up the matrix. (For example, rotation invariance is lost when using gradient information in the $ x, y $ directions)
Distance Calculation on Covariance Matrices
The covariance matrix is not a Euclidean space but a set of definite matrix
\mathcal{P}(d) := \left\{X\in\mathbb{R}^{d\times d}\mid X=X^\top,X\succ O\right\}
It belongs to. This can be confirmed by the fact that $ -C_R $ cannot be a variance-covariance matrix for the variance-covariance matrix $ C_R $. Therefore, when performing the nearest neighbor search using the variance-covariance matrix, the distance on $ \ mathcal {P} (d) $ instead of the Euclidean distance (related to Lie groups and Riemannian manifolds)
\rho(C_1, C_2) = \sqrt{\sum_{i=1}^d \ln^2\lambda_i(C_1, C_2)} \tag{1}
Is used. Here, $ \ lambda_i (C_1, C_2) $ is the generalized eigenvalue of $ C_1, C_2 $, for the generalized eigenvector $ x_i \ neq0 $.
\lambda_i C_1 x_i - C_2 x_i = 0, \quad i=1, \dots, d
Meet.
Integral Images for Fast Covariance Computation
An integral image is used to calculate the variance-covariance matrix at high speed. What is an integrated image?
\text{Integral Image }(x', y') = \sum_{x<x' ,y<y'}I(x, y)
It is the sum of the pixel values existing in the upper left of the pixel of interest, as defined in. →
The $ (i, j) $ component of the covariance matrix is
\begin{align}
C_R(i, j) &= \frac{1}{n-1}\sum_{k=1}^n (z_k(i) - \mu(i))(z_k(j) - \mu(j))\\
&= \frac{1}{n-1}\left[\sum_{k=1}^n z_k(i)z_k(j) - \mu(i)\sum_{k=1}^n z_k(j) - \mu(j)\sum_{k=1}^n z_k(i) + n\mu(i)\mu(j)\right]\\
&= \frac{1}{n-1}\left[\sum_{k=1}^n z_k(i)z_k(j) - \frac1n\sum_{k=1}^n z_k(i)\sum_{k=1}^n z_k(j)\right]\\
\end{align}
It is calculated by, and it is found that the sum of the first and second orders of $ z_k $ is required. Therefore, $ W \ times H \ times d $ dimensional integrated image $ P $ and $ W \ times H \ times d \ times d $ dimensional integrated image $ Q $, respectively.
\begin{align}
P(x',y',i) &= \sum_{x<x', y<y'}F(x, y, i), \quad i=1, \dots, d \tag{2}\\
Q(x',y',i, j) &= \sum_{x<x', y<y'}F(x, y, i)F(x, y, j), \quad i, j=1, \dots, d \tag{3}
\end{align}
It is defined by. Also, the values of the integrated image at one point $ (x, y) $ are set respectively.
\begin{align}
p_{x, y} &= \left[P(x, y, 1), \dots, P(x, y, d)\right]^\top\\
Q_{x, y} &=
\begin{pmatrix}
Q(x, y, 1, 1) & \cdots & Q(x, y, 1, d)\\
\vdots & \ddots & \vdots\\
Q(x, y, d, 1) & \cdots & Q(x, y, d, d)
\end{pmatrix}
\end{align}
Let.
Let the rectangular area of $ (x', y') $ in the upper left and $ (x'', y'') $ in the lower right be $ R (x', y'; x'', y'') $. Then, the variance-covariance matrix in this rectangular region is $ n = (x''-x') \ cdot (y''-y') $.
\begin{align}
C_{R(x', y'; x'', y'')} = \frac{1}{n-1}\left[Q_{x'', y''} + Q_{x', y'} - Q_{x'', y'} - Q_{x', y''}\\
-\frac1n(p_{x'', y''} + p_{x', y'} - p_{x'', y'} - p_{x', y''})(p_{x'', y''} + p_{x', y'} - p_{x'', y'} - p_{x', y''})^\top\right] \tag{4}
\end{align}
Can be calculated with.
Object Detection
In this paper, the image features are $ x, y $ coordinates of pixels, RGB values of images $ R (x, y), G (x, y), B (x, y) $, and brightness values of images $ I. Object detection is performed using nine absolute values of the first-order and second-order gradients of (x, y) $. The first-order and second-order gradients are calculated using the filters $ [-1, 0, 1], [-1, 2, -1] $.
F(x, y) = \left[x, y, R(x, y), G(x, y), B(x, y), \left|\frac{\partial I(x, y)}{\partial x}\right|, \left|\frac{\partial I(x, y)}{\partial y}\right|, \left|\frac{\partial^2 I(x, y)}{\partial x^2}\right|, \left|\frac{\partial^2 I(x, y)}{\partial y^2}\right|\right]^\top \tag{5}
In the paper, five variance-covariance matrices are created to represent the object, and five variance-covariance matrices are calculated again for 1000 windows with a small distance between the variance-covariance matrices. Therefore, a numerical experiment was performed with only one.
The part surrounded by pink in the left image was used as the input image, and object detection was performed on the two images on the right with different resolutions.
The source code is as follows, and the executed notebook is on Github.
calc_distance
is used to calculate the distance in Eq. (1). The generalized eigenvalue uses the Hermitian eigenvalue calculation function scipy.linalg.eigh
. did._extract_features
is used to extract the features of Eq. (5). Gradient calculation is performed using cv2.filter2D
._calc_integral_images
to calculate the integral images of equations (2) and (3). The integrated image is calculated by cv2.integral
._calc_covariance
is used to calculate Eq. (4).from typing import List, Tuple
from itertools import product
import cv2
import numpy as np
from scipy.linalg import eigh
from tqdm import tqdm
def calc_distance(x: np.ndarray, y: np.ndarray) -> float:
w = eigh(x, y, eigvals_only=True)
return np.linalg.norm(np.log(w) ** 2)
class RegionCovarianceDetector:
"""
Object Detector Using Region Covariance
Parameters
----------
coord : bool, optional (default=True)
Whether use coordinates as features or not.
color : bool, optional (default=True)
Whether use color channels as features or not.
intensity : bool, optional (default=False)
Whether use intensity as feature or not.
kernels : a list of np.ndarray, optional (default=None)
Filters applied to intensity image. If None, no filters are used.
ratio : float, optional (default=1.15)
Scaling factor between two consecutive scales of the search window size and step size.
step : int, optional (default=3)
The minimum step size.
n_windows : int, optional (default=9)
The number of scales of the windows.
eps : float, optional (default=1e-16)
Small number to keep covariance matrices in SPD.
Attributes
----------
object_shape_ : (int, int)
The object's shape.
object_covariance_ : np.ndarray, shape (n_features, n_features)
Covariance matrix of the object.
"""
def __init__(
self,
coord: bool = True,
color: bool = True,
intensity: bool = False,
kernels: List[np.ndarray] = None,
ratio: float = 1.15,
step: int = 3,
n_windows: int = 9,
eps: float = 1e-16
):
self.coord = coord
self.color = color
self.intensity = intensity
self.kernels = kernels
self.ratio = ratio
self.step = step
self.n_windows = n_windows
self.eps = eps
self.object_shape_ = None
self.object_covariance_ = None
def _extract_features(self, img: np.ndarray) -> List[np.ndarray]:
"""
Extract image features.
Parameters
----------
img : np.ndarray, shape (h, w, c)
uint8 RGB image
Returns
-------
features : a list of np.ndarray
Features such as intensity, its gradient and so on.
"""
h, w, c = img.shape[:3]
intensity = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)[:, :, 2] / 255.
features = list()
# use coordinates
if self.coord:
features.append(np.tile(np.arange(w, dtype=float), (h, 1)))
features.append(np.tile(np.arange(h, dtype=float).reshape(-1, 1), (1, w)))
# use color channels
if self.color:
for i in range(c):
features.append(img[:, :, i].astype(float) / 255.)
# use intensity
if self.intensity:
features.append(intensity)
# use filtered images
if self.kernels is not None:
for kernel in self.kernels:
features.append(np.abs(cv2.filter2D(intensity, cv2.CV_64F, kernel)))
return features
def _calc_integral_images(self, img: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""
Calculate integral images.
Parameters
----------
img : np.ndarray, shape (h, w, c)
uint8 RGB image
Returns
-------
P : np.ndarray, shape (h+1, w+1, n_features)
First order integral images of features.
Q : np.ndarray, shape (h+1, w+1, n_features, n_features)
Second order integral images of features.
"""
h, w = img.shape[:2]
features = self._extract_features(img)
length = len(features)
# first order integral images
P = cv2.integral(np.array(features).transpose((1, 2, 0)))
# second order integral images
Q = cv2.integral(
np.array(list(map(lambda x: x[0] * x[1], product(features, features)))).transpose((1, 2, 0))
)
Q = Q.reshape(h + 1, w + 1, length, length)
return P, Q
def _calc_covariance(self, P: np.ndarray, Q: np.ndarray, pt1: Tuple[int, int], pt2: Tuple[int, int]) -> np.ndarray:
"""
Calculate covariance matrix from integral images.
Parameters
----------
P : np.ndarray, shape (h+1, w+1, n_features)
First order integral images of features.
Q : np.ndarray, shape (h+1, w+1, n_features, n_features)
Second order integral images of features.
pt1 : (int, int)
Left top coordinate.
pt2 : (int, int)
Right bottom coordinate.
Returns
-------
covariance : np.ndarray, shape (n_features, n_features)
Covariance matrix.
"""
x1, y1 = pt1
x2, y2 = pt2
q = Q[y2, x2] + Q[y1, x1] - Q[y1, x2] - Q[y2, x1]
p = P[y2, x2] + P[y1, x1] - P[y1, x2] - P[y2, x1]
n = (y2 - y1) * (x2 - x1)
covariance = (q - np.outer(p, p) / n) / (n - 1) + self.eps * np.identity(P.shape[2])
return covariance
def fit(self, img: np.ndarray):
"""
Calculate the object covariance matrix.
Parameters
----------
img : np.ndarray, shape (h, w, c)
uint8 RGB image
Returns
-------
: Fitted detector.
"""
h, w = img.shape[:2]
P, Q = self._calc_integral_images(img)
# normalize about coordinates
if self.coord:
for i, size in enumerate((w, h)):
P[:, :, i] /= size
Q[:, :, i] /= size
Q[:, :, :, i] /= size
# calculate covariance matrix
self.object_covariance_ = self._calc_covariance(P, Q, (0, 0), (w, h))
self.object_shape_ = (h, w)
return self
def predict(self, img: np.ndarray) -> Tuple[Tuple[int, int], Tuple[int, int], float]:
"""
Compute object's position in the target image.
Parameters
----------
img : np.ndarray, shape (h, w, c)
uint8 RGB image
Returns
-------
pt1 : (int, int)
Left top coordinate.
pt2 : (int, int)
Right bottom coordinate.
score : float
Dissimilarity of object and target covariance matrices.
"""
tar_h, tar_w = img.shape[:2]
obj_h, obj_w = self.object_shape_
P, Q = self._calc_integral_images(img)
# search window's shape and step size
end = (self.n_windows + 1) // 2
start = end - self.n_windows
shapes = [(int(obj_h * self.ratio ** i), int(obj_w * self.ratio ** i)) for i in range(start, end)]
steps = [int(self.step * self.ratio ** i) for i in range(self.n_windows)]
distances = list()
for shape, step in tqdm(zip(shapes, steps)):
p_h, p_w = shape
p_P, p_Q = P.copy(), Q.copy()
# normalize about coordinates
if self.coord:
for i, size in enumerate((p_w, p_h)):
p_P[:, :, i] /= size
p_Q[:, :, i] /= size
p_Q[:, :, :, i] /= size
distance = list()
y1, y2 = 0, p_h
while y2 <= tar_h:
dist = list()
x1, x2 = 0, p_w
while x2 <= tar_w:
# calculate covariance matrix
p_cov = self._calc_covariance(p_P, p_Q, (x1, y1), (x2, y2))
# jump horizontally
x1 += step
x2 += step
# calculate dissimilarity of two covariance matrices
dist.append(calc_distance(self.object_covariance_, p_cov))
# jump vertically
y1 += step
y2 += step
distance.append(dist)
distances.append(np.array(distance))
# choose the most similar window
min_indices = list(map(np.argmin, distances))
min_index = int(np.argmin([dist.flatten()[i] for i, dist in zip(min_indices, distances)]))
min_step = steps[min_index]
min_shape = shapes[min_index]
min_indice = min_indices[min_index]
b_h, b_w = distances[min_index].shape
pt1 = ((min_indice % b_w) * min_step, (min_indice // b_w) * min_step)
pt2 = (pt1[0] + min_shape[1], pt1[1] + min_shape[0])
score = distances[min_index].flatten()[min_indice]
return pt1, pt2, score
The execution results and calculation times are 6.22 s and 19.7 s, respectively.
You can make it a little faster by changing the values of n_windows
and step
.
Recommended Posts