Try to measure the position of the object on the desk (real coordinate system) from the camera image with Python + OpenCV

Introduction

The goal is to find the position of an arbitrary object placed on a plane (desk) with a reference marker in the real coordinate system as follows **. It is implemented in Python and uses OpenCV's ** ArUco ** module. 概要.png

-[Measure object position with OpenCV / ArUco library](https://inody1991.tumblr.com/post/172449790255/opencvaruco%E3%83%A9%E3%82%A4%E3%83%95-%E3%83 % A9% E3% 83% AA% E3% 81% A6-% E7% 89% A9% E4% BD% 93% E4% BD% 8D% E7% BD% AE% E8% A8% 88% E6% B8% AC) @ inoblog was used as a reference.

Execution environment

Use GoogleColab. (Python 3.6.9). The version of the module used is as follows.

Various module versions


%pip list | grep -e opencv -e numpy
numpy                    1.17.5     
opencv-contrib-python    4.1.2.30   
opencv-python            4.1.2.30   

Creating markers with ArUco

First, load the modules required for processing.

In the GoogleColab. environment, cv2.imshow (...) is not available, but by doing the following, the image (np.ndarray) can be displayed in the execution result cell with cv2_imshow (...). You can output it.

Module import


import cv2
import numpy as np
from google.colab.patches import cv2_imshow

ArUco has ** a set of predefined markers **. This time, we will use it. ʻAruco.getPredefinedDictionary (...) gets the dictionary that contains the predefined markers. The argument ʻaruco.DICT_4X4_50 selects a dictionary with up to $ 50 $ markers with a $ 4 \ times 4 $ fill pattern inside the square.

Save the predefined markers (0-3) as an image file


aruco = cv2.aruco
p_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
marker =  [0] * 4 #Initialization
for i in range(len(marker)):
  marker[i] = aruco.drawMarker(p_dict, i, 75) # 75x75 px
  cv2.imwrite(f'marker{i}.png', marker[i])

Executing the above will output $ 4 $ files ($ 75 \ times 75 $ pixel size) from "marker0.png " to "marker3.png ".

In the GoogleColab. environment, the created marker can be displayed and confirmed in the output cell with cv2_imshow (...). You can see that a marker with a $ 4 \ times 4 $ fill pattern is created inside the square. cv2_show_marker.png If you give DICT_5X5_100 to the argument of ʻaruco.getPredefinedDictionary (...)`, you can get a dictionary with up to $ 100 $ markers defined in the pattern of $ 5 \ times 5 $. cv2_show_marker_5x5.png

Marker detection by ArUco

Print "marker0.png ", that is, the output of the 0th marker of ʻaruco.DICT_4X4_50 on paper. Then, let's call it "m0-photo.jpg " taken with a camera. Markers are detected for this target, and the image ʻimg_marked that overlays the detection results is output.

Detected from photos containing marker 0


img = cv2.imread('m0-photo.jpg')
corners, ids, rejectedImgPoints = aruco.detectMarkers(img, p_dict) #detection
img_marked = aruco.drawDetectedMarkers(img.copy(), corners, ids)   #Overlay detection results
cv2_imshow(img_marked) #display

m0_photo_marked.png

The entire marker is ** green square ** </ font>, the upper left corner of the marker is ** red square ** </ font>, and the marker ID (number) ) Is drawn with ** blue characters </ font> ** overlaid on the original image. From this, you can see that the marker detection is done properly.

Let's take a closer look at the ** return ** of ʻaruco.detectMarkers (...) `.

corners stores the ** corner (four corners) image coordinates ** of each marker found in the image in a list of np.ndarray.

py:aruco.detectMarkers(...)Return value corners


#■ Return corners
# print(type(corners)) # -> <class 'list'>
# print(len(corners))  # -> 1
# print(type(corners[0]))  # -> <class 'numpy.ndarray'>
# print(corners[0].shape)  # -> (1, 4, 2)
print(corners)

Execution result


[array([[[265., 258.],
        [348., 231.],
        [383., 308.],
        [297., 339.]]], dtype=float32)]

The corner coordinates are stored counterclockwise in the order of top left, top right, bottom right, bottom left of the marker. That is, it looks like this: m0-photo1_.png

ʻIds` stores the ** marker ID ** found in the image in numpy.ndarray format.

py:aruco.detectMarkers(...)Return value ids


#■ Return ids
# print(type(ids)) # -> <class 'numpy.ndarray'>
# print(ids.shape) # -> (1, 1)
print(ids)

The result will be something like [[0]] (note that it is not [0]).

rejectedImgPoints stores the coordinates of the squares found in the image that do not contain the proper pattern inside (https://docs.opencv.org/4.1.2). /d9/d6a/group__aruco.html#gab9159aa69250d8d3642593e508cb6baa) contains the imgPoints of those squares whose inner code has not a correct codification. Useful for debugging purposes.).

I think it's easier to understand this if you look at a concrete example. rejectedImgPoints stores the coordinates of the square as pointed by the arrow in the figure. rejectedImgPoints.png

Marker detection by ArUco (when there are multiple markers)

Check the result when there are multiple markers in the image (including the case where there are two or more same markers) as follows. ms-photo.jpg The code for detection is the same as before. Overlaying the detection results looks like this: It does not matter if there are two or more of the same markers. ms_photo_marked.png

The return values of ʻaruco.detectMarkers (...) ʻids and corners are as follows: Note that they are not sorted by ID.

print(ids)Result of


[[1]
 [1]
 [0]
 [2]]

print(corners)Result of


[array([[[443., 288.],
        [520., 270.],
        [542., 345.],
        [464., 363.]]], dtype=float32), array([[[237., 272.],
        [313., 255.],
        [331., 328.],
        [254., 346.]]], dtype=float32), array([[[ 64., 235.],
        [140., 218.],
        [154., 290.],
        [ 75., 307.]]], dtype=float32), array([[[333., 113.],
        [404.,  98.],
        [424., 163.],
        [351., 180.]]], dtype=float32)]

Position measurement in real coordinate system

Prepare a piece of paper in which the four markers "marker0.png " to "marker3.png " created first are arranged clockwise as shown below. Here, the distance between the markers is $ 150 \ mathrm {mm} $ (this paper creation will be inaccurate if not done strictly). table.png

Place the object "dog" whose position you want to detect on this paper, and shoot from ** an appropriate position above ** (the shooting position and angle at this time do not have to be exact). The purpose is to find the position of this "dog" in the ** real coordinate system ** relative to the marker. inu.png

Convert to an image seen from directly above

The camera image taken from an appropriate position and angle above is converted to the ** image seen from directly above **. Here, the converted image is $ 500 \ times 500 , \ mathrm {px} $.

Convert to an image seen from directly above


aruco = cv2.aruco
p_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
img = cv2.imread('inu.jpg')
corners, ids, rejectedImgPoints = aruco.detectMarkers(img, p_dict) #detection

#Store the "center coordinates" of the marker in m in order from the upper left in the clockwise direction.
m = np.empty((4,2))
for i,c in zip(ids.ravel(), corners):
  m[i] = c[0].mean(axis=0)

width, height = (500,500) #Image size after transformation

marker_coordinates = np.float32(m)
true_coordinates   = np.float32([[0,0],[width,0],[width,height],[0,height]])
trans_mat = cv2.getPerspectiveTransform(marker_coordinates,true_coordinates)
img_trans = cv2.warpPerspective(img,trans_mat,(width, height))
cv2_imshow(img_trans)

The execution result is as follows. You can see that the ** center ** of each marker is transformed so that it becomes the four corners of the image. inu_trans.png Even if you use an image taken from a more oblique position, you can convert it to an image as seen from directly above as follows. coin.png

For the sake of clarity, the ** center ** of each marker is now the four corners of the converted image. However, it is inconvenient to calculate the position of the "dog" (marker fragments may be reflected in the four corners ...).

Therefore, modify the program so that the corners of each marker touching the $ 150 \ times 150 \ mathrm {mm} $ rectangle on the paper are the four corners of the converted image.

Converted to an image seen from directly above (improved version)


aruco = cv2.aruco
p_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
img = cv2.imread('inu.jpg')
corners, ids, rejectedImgPoints = aruco.detectMarkers(img, p_dict) #detection

#Change here
corners2 = [np.empty((1,4,2))]*4
for i,c in zip(ids.ravel(), corners):
  corners2[i] = c.copy()
m[0] = corners2[0][0][2]
m[1] = corners2[1][0][3]
m[2] = corners2[2][0][0]
m[3] = corners2[3][0][1]

width, height = (500,500) #Image size after transformation
marker_coordinates = np.float32(m)
true_coordinates   = np.float32([[0,0],[width,0],[width,height],[0,height]])
trans_mat = cv2.getPerspectiveTransform(marker_coordinates,true_coordinates)
img_trans = cv2.warpPerspective(img,trans_mat,(width, height))
cv2_imshow(img_trans)

The execution result is as follows. inu_trans2.png The size of this image is $ 500 \ times 500 , \ mathrm {px} $, and the corresponding size on paper is $ 150 \ times 150 , \ mathrm {mm} $. Therefore, if the coordinates on the image are $ (160 \ mathrm {px}, , 200 \ mathrm {px}) $, the corresponding real coordinates are $ (160 \ times \ frac {150} {500} = 48. It can be calculated as \ mathrm {mm}, , 200 \ times \ frac {150} {500} = 60 \ mathrm {mm}) $.

Object detection from images

Use the various functions provided by OpenCV to find the position of the dog from the above image. Especially important is cv2.connectedComponentsWithStats (...).

Position output after object detection and real coordinate conversion


tmp = img_trans.copy()

# (1)Grayscale conversion
tmp = cv2.cvtColor(tmp, cv2.COLOR_BGR2GRAY)
#cv2_imshow(tmp)

# (2)Blur processing
tmp = cv2.GaussianBlur(tmp, (11, 11), 0)
#cv2_imshow(tmp)

# (3)Binarization process
th = 130 #Binarization threshold(Adjustment required)
_,tmp = cv2.threshold(tmp,th,255,cv2.THRESH_BINARY_INV) 
#cv2_imshow(tmp)

# (4)Blob (= mass) detection
n, img_label, data, center = cv2.connectedComponentsWithStats(tmp)

# (5)Organize detection results
detected_obj = list() #Storage destination of detection result
tr_x = lambda x : x * 150 / 500 #X-axis image coordinates → real coordinates
tr_y = lambda y : y * 150 / 500 #Y axis 〃
img_trans_marked = img_trans.copy()
for i in range(1,n):
  x, y, w, h, size = data[i]
  if size < 300 : #Ignore areas less than 300px
    continue
  detected_obj.append( dict( x = tr_x(x),
                              y = tr_y(y),
                              w = tr_x(w),
                              h = tr_y(h),
                              cx = tr_x(center[i][0]),
                              cy = tr_y(center[i][1])))  
  #Verification
  cv2.rectangle(img_trans_marked, (x,y), (x+w,y+h),(0,255,0),2)
  cv2.circle(img_trans_marked, (int(center[i][0]),int(center[i][1])),5,(0,0,255),-1)

# (6)View results
cv2_imshow(img_trans_marked)
for i, obj in enumerate(detected_obj,1) :
  print(f'■ Detected object{i}Center position X={obj["cx"]:>3.0f}mm Y={obj["cy"]:>3.0f}mm ')

The execution result is as follows. inu_trans_marked.png

Execution result


■ Center position X of detected object 1=121mm Y= 34mm

Place a ruler and see if the above results are correct. It seems that they are looking for it properly. inu_pos.png

Additional experiments

Let's put more objects and check. m4-photo2.jpg I needed to "adjust the threshold" when binarizing, but it worked. Detected object 1 is "dog (brown)", detected object 2 is "dog (gray)", detected object 3 is "rabbit", and detected object 4 is "bear". m4-result.png

Next step

--Try combining with machine learning (image classification) (classify each object in the image into "dog", "rabbit", and "bear"). --Try to create a demo in combination with a robot arm.

robot_arm.jpg

Related entries

-Challenge image classification by TensorFlow2 + Keras -General object detection using Google's pre-trained model (via TensorFlow Hub)

Recommended Posts