In computer vision, a homography matrix (projection transformation matrix) often appears. This is a 3x3 matrix that can project from the image coordinates of the original image to the transformed image when projective transformation (enlargement / reduction / rotation / translation, etc.) is performed on a certain image. Conversely, in image matching between two images, the homography matrix can be estimated when there are multiple corresponding points. In general, there is an error in the coordinates of the corresponding points, so the homography matrix that minimizes this error should be estimated by the least squares method. I can.
In this paper, we show the procedure for estimating the homography matrix using the DLT method using a concrete example, and explain in detail the mathematical theory for finding the least squares solution by singular value decomposition (SVD), which is not well explained. I will explain.
The homography matrix and the projective transformation of the image coordinates are expressed as follows. In homography transformation (projection transformation), nine components can be freely determined, but this time we will consider affine transformation only for rotation and translation.
H = \lambda\left(
\begin{array}{ccc}
\cos \theta & -\sin\theta & t_1 \\
\sin\theta & \cos\theta & t_2 \\
0 & 0 & 1
\end{array}
\right)
\\
\vec{x}_1 = (x_1,y_1,1)^T,\ \vec{x}_2 = (x_2,y_2,1)^T \\
\vec{x}_2 = H\vec{x}_1
Here, the image coordinates $ \ vec {x} _1, \ vec {x} _2 $ use the homogeneous coordinate system. $ t_1 and t_2 $ represent translation, and the upper left 2x2 is the rotation matrix. There is also a constant multiple indefiniteness due to scaling, or scale, and $ \ lambda $ is a factor that represents it. There are nine components of the homography matrix, but the degree of freedom is eight because of the indefiniteness of a constant multiple. In other words, the homography matrix can be estimated if there are four or more corresponding points (two equations x and y can be established for each corresponding point).
Affine transforms the original image to generate two images. The affine transformation used the opencv function. It is possible to estimate with 4 or more corresponding points, but this time we will prepare 10 points. Affine transformation is performed by the homography matrix to obtain the coordinates. img1 img2
Corresponding point coordinates
img1 (10, 3)
[[ 66 215 1]
[135 32 1]
[362 185 1]
[479 40 1]
[239 247 1]
[407 147 1]
[253 14 1]
[ 11 103 1]
[474 181 1]
[ 57 222 1]]
img2 (10, 3)
[[ 57 229 1]
[141 53 1]
[354 225 1]
[483 91 1]
[226 276 1]
[402 191 1]
[260 45 1]
[ 11 113 1]
[466 231 1]
[ 47 236 1]]
code
def main():
os.chdir(PATH_NAME)
src = cv2.imread("./stk.png ", cv2.IMREAD_COLOR)
src = cv2.resize(src,(512,256))
cv2.imwrite("src.png ", src)
print(src.shape)
#Rotation 5deg, translation(10,10)
theta = np.radians(10)
t1 = 10
t2 = 10
#Homography matrix
H = np.array([[np.cos(theta),-np.sin(theta),t1]
,[np.sin(theta),np.cos(theta),t2]
,[0,0,1]])
print("Homography matrix=\n" ,H)
#Image coordinates in the original image*4 points
img_1 = []
for x in range(4):
v = random.randint(0,src.shape[0])
u = random.randint(0,src.shape[1])
img_1.append([u,v,1])
img_1 = np.array(img_1)
print("img1",img_1.shape)
print(img_1)
#Image coordinate N points in the converted image (projection conversion)
img_2 = []
for row in img_1:
x2 = H @ row
x2 = x2.astype('int')
x2[0] = x2[0]
x2[1] = x2[1]
img_2.append(x2)
img_2 = np.array(img_2)
print("img2",img_2.shape)
print(img_2)
#M is a 2x3 matrix affine transformation of the image
M = H[:2,]
affine_img = cv2.warpAffine(src, M, (src.shape[1], src.shape[0]))
cv2.imshow("color", affine_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite("affine_img.png ", affine_img)
return
Now, let's get into the main subject from here. The DLT method is used for estimation. This is a linear equation that transforms into a form that can use the least squares method. The least squares method minimizes the square error of $ A \ boldsymbol {x}-\ boldsymbol {b} = 0 $ in linear simultaneous equations that can be written with $ A \ boldsymbol {x} = \ boldsymbol {b} $. Estimate $ \ boldsymbol {x} $. The formula for projective transformation is
\boldsymbol{x}_2 = H\boldsymbol{x}_1
was. It is in the form of $ A \ boldsymbol {x} = \ boldsymbol {b} $, but what you want to estimate is not $ x $, but $ H $ in the coefficient matrix. Here, using the fact that the projected vectors are parallel, $ \ boldsymbol {x} _2 \ times H \ boldsymbol {x} _1 = 0 $. When the formula is transformed, it becomes as follows.
\left[
\begin{array}{ccccccccc}
0 & 0 & 0 & -x_1 & -y_1 & -1 & y_2 x_1 & y_2 y_1 & y_2 \\
x_1 & y_1 & 1 & 0 & 0 & 0 & -x_2 x_1 & -x_2 y_1 & -x_2 \\
\end{array}
\right]
\left[
\begin{array}{c}
h_1 \\
h_2 \\
h_3 \\
h_4 \\
h_5 \\
h_6 \\
h_7 \\
h_8 \\
h_9 \\
\end{array}
\right]
=\boldsymbol{0}
Here, the point cloud of the corresponding points of img_1 and img_2 is expressed as follows.
\boldsymbol{x}_A=\{\boldsymbol{x}_1^1,\boldsymbol{x}_1^2,\cdots,\boldsymbol{x}_1^k\}\\
\boldsymbol{x}_B=\{\boldsymbol{x}_2^1,\boldsymbol{x}_2^2,\cdots,\boldsymbol{x}_2^k\}
If the above formula is extended by the corresponding points,
\left[
\begin{array}{ccccccccc}
0 & 0 & 0 & -x_A^1 & -y_A^1 & -1 & y_B^1 x_A^1 & y_B^1 y_A^1 & y_B^1 \\
x_A^1 & y_A^1 & 1 & 0 & 0 & 0 & -x_B^1 x_A^1 & -x_B^1 y_A^1 & -x_B^1 \\
\vdots \\
0 & 0 & 0 & -x_A^k & -y_A^k & -1 & y_B^k x_A^k & y_B^k y_A^k & y_B^k \\
x_A^k & y_A^k & 1 & 0 & 0 & 0 & -x_B^k x_A^k & -x_B^k y_A^k & -x_B^k \\
\end{array}
\right]
\left[
\begin{array}{c}
h_1 \\
h_2 \\
h_3 \\
h_4 \\
h_5 \\
h_6 \\
h_7 \\
h_8 \\
1 \\
\end{array}
\right]
=\boldsymbol{0}
Here, since there is indefiniteness of a constant multiple, as a constraint condition to reduce the degree of freedom
Now, the least squares normal equation has the following form.
A^{T}A\boldsymbol{x} = A^{T}\boldsymbol{b}
This time, the right side is 0, so
A^{T}A\boldsymbol{x} = 0
It will be. It corresponds to this.
Now, In the least squares method
J = \frac{1}{2} \sum_{i=1}^{m}(\sum_{j=1}^{n} a_{ij}x_{j}-b_i)^2 \rightarrow min
Before setting partial derivative = 0 to find the minimum value of
J = \frac{1}{2}( x^{T}A^{T}Ax-x^{T}A^{T}b-b^{T}Ax+b^{T}b )\\
It was. This time, it is $ \ boldsymbol {b = 0} $, so
J = \frac{1}{2} x^{T}A^{T}Ax
It will be. Here, the matrix $ A ^ TA $ is known to be a semi-regular definite matrix (symmetric matrix with eigenvalues of 0 or positive), and at this time, it is in the quadratic form $ \ boldsymbol {x} ^. There is a theorem that $ \ boldsymbol {x} $, which minimizes TA ^ TA \ boldsymbol {x} $, is the eigenvector corresponding to the minimum eigenvalue of the matrix $ A ^ TA $. (→ Linear algebra) This time, $ \ boldsymbol {h} $ corresponds to $ \ boldsymbol {x} $.
The eigenvectors corresponding to the minimum eigenvalues of the matrix $ A ^ TA $ can be obtained by singular value decomposition (SVD). The singular value decomposition of $ A $
A = U\Sigma V^T \\
\Sigma =
\left[
\begin{array}{ccc}
\sigma_1 & & \\
& \sigma_2 & \\
& \cdots & \\
& \ & \sigma_r \\
\end{array}
\right]
is. $ U $ is the eigenvectors of $ AA ^ T $ arranged in descending order of positive eigenvalues, and $ V $ is the eigenvectors of $ A ^ TA $ arranged in descending order of positive eigenvalues. $ \ sigma_1, \ sigma_2, \ cdots, \ sigma_r $ are called singular values and are the square roots (> = 0) of the eigenvalues of $ A ^ TA $. What we want here is the eigenvector corresponding to the least eigenvalue of $ A ^ TA $, which is the least squares solution, so the last row of $ V $ corresponds to it. The singular value may be zero. This means that $ A $ is not full rank, but even then the last row of $ V $, the eigenvector corresponding to the singular value of zero, is the least squares solution. In other words, if the minimum eigenvalue is $ \ lambda_ {min} $,
A^TA\boldsymbol{h}=\lambda_{min} \boldsymbol{h}
The solution that satisfies is the desired $ \ boldsymbol {h} $. Divide the whole so that $ h_ {9} $ is finally 1.
If you estimate $ H $ with SVD as shown below, you can see that they are almost the same. If the score is low, a solution will be obtained if there are 4 or more, but they do not match very well. There seem to be various techniques such as taking the center of gravity of the image coordinates and normalizing them in order to calculate accurately.
Homography matrix=
[[ 0.9961947 -0.08715574 10. ]
[ 0.08715574 0.9961947 10. ]
[ 0. 0. 1. ]]
Estimated Homography matrix=
[[ 9.97456897e-01 -8.38633226e-02 8.92359681e+00]
[ 8.68971782e-02 9.98203988e-01 9.29572691e+00]
[ 7.52034139e-07 6.24553736e-06 1.00000000e+00]]
Using this estimated $ H $ to transform the image coordinates, it is almost the same as when using the original $ H $.
[382 105 1]
[382 105 1]
code:
##H matrix estimation
A = np.zeros((img_1.shape[0]*2,9))
for i, (a, b) in enumerate(zip(img_1,img_2)):
A[i * 2] = np.array([a[0], a[1], 1, 0, 0, 0, -b[0] * a[0], -b[0] * a[1], -b[0]])
A[i * 2 + 1] = np.array([0, 0, 0, a[0], a[1], 1, -b[1] * a[0], -b[1] * a[1], -b[1]])
print(A.shape)
print("A=",A)
print(H.flatten())
print(np.cross(img_2[0],(H @ img_1[0])))
print(A @ H.flatten())
# singular value decomposition
u, s, vh = svd(A)
print('\nSVD result')
print('shape of u, s, vh:', u.shape, s.shape, vh.shape)
print('singular values:', s.round(2))
min = 8
print('minimum eigen vector:', vh[min])
Hest = vh[min].reshape((3,3))
Hest = Hest / Hest[2,2]
print("Estimated Homography matrix=\n" ,Hest)
print((H @ img_1[0] / (H @ img_1[0])[2]).astype('int'))
print((Hest @ img_1[0] / (Hest @ img_1[0])[2]).astype('int'))
Recommended Posts