[PYTHON] Angle correction (projection conversion) of the license using OpenCV-Automatically determine the binarization threshold-

Motivation

-** I want to read the license information from the photo! ** ** ――I want to enjoy image processing!

Overview

--The outline of a card (nanaco card) of the same size as the driver's license is detected by ** OpenCV **, and ** projective conversion ** is performed to make the contents of the card easier to read **. --Ready to read content with OCR (reading content will be introduced in the next article) ――Since I don't do OCR, I will substitute the same size nanaco card this time.

-** From diagonally above ** The card I took ... → You can now ** correct the angle and display the card ** like this

Differentiation points from other similar articles

--** [Dynamic determination of binarization threshold](#Determining binarization threshold) ** logic for card detection -(It looks like the accuracy is just a little better) ――I haven't verified it properly, so it's a feeling level.

Assumed reader

-** People who want to detect contours (edge detection) with OpenCV ** --People who want to read card information from photos

Work procedure

  1. ** Environment construction **
  2. ** Binarization ** [^ binarization]
  3. ** Contour extraction **
  4. ** Homography **

Environment

I use Pipenv.

Pipenv

brew install pipenv

Relationship package

pipenv install numpy matplotlib opencv-contrib-python pyocr
pipenv install jupyterlab --dev

Directory structure

Three images with the following configuration

--nanaco.jpeg (taken in the easiest way) --nanaco_skew.jpeg (taken so that the shape of the card is distorted from an angle) --nanaco_in_hand.jpeg (taken with your hand on a white background)

I will try. The source code uses jupyter notebook card.ipynb.

.
├── Pipfile
├── Pipfile.lock
├── images
│   ├── nanaco.jpeg
│   ├── nanaco_in_hand.jpeg
│   └── nanaco_skew.jpeg
└── notebooks
    └── card.ipynb

Try to display a photo of the card with OpenCV

  1. Start Jupyter Lab
pipenv run jupyter lab
  1. Create notebooks / card.ipynb and execute the following in the cell (execute all other scripts in the cell)
%matplotlib inline
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('../images/nanaco_skew.jpeg')
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

image.png

I'm curious about the scale of matplotlib, but I don't care too much, but rather the coordinates are easy to understand, so I will proceed as it is this time.

Binarization

Grayscale

#Grayscale
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(gray_img)
plt.gray()

Binarization threshold determination

In many tutorials and articles, the threshold value for binarization was hard-coded to a value of about 200 and was treated as if it was determined manually. In this article, I've included ** logic that dynamically (automatically) determines the threshold **

import numpy as np
#nanaco is 0.About 2 looks good. If you have a driver's license, you may need to tune again
card_luminance_percentage = 0.2

# TODO:Performance outlook
def luminance_threshold(gray_img):
    """
Value in grayscale(Called brightness)But`x`The number of points above is 20%Calculate the maximum x that exceeds
However,`100 <= x <= 200`To
    """
    number_threshold = gray_img.size * card_luminance_percentage
    flat = gray_img.flatten()
    # 200 -> 100 
    for diff_luminance in range(100):
        if np.count_nonzero(flat > 200 - diff_luminance) >= number_threshold:
            return 200 - diff_luminance
    return 100

threshold = luminance_threshold(gray_img)
print(f'threshold: {threshold}')

The thresholds for the three types of images this time were calculated as follows.

For example, in nanaco_skew.jpeg, it didn't work if the threshold was (commonly used) 200, probably because of the amount of reflected light. By using 138 calculated from the above source code, you can get the outline of the card later.

nanaco.jpeg nanaco_skew.jpeg nanaco_in_hand.jpeg
Binarization threshold 200 138 199
image nanaco.jpeg nanaco_skew.jpeg nanaco_in_hand.jpeg

Binarization


_, binarized = cv2.threshold(gray_img, threshold, 255, cv2.THRESH_BINARY)
plt.imshow(cv2.cvtColor(binarized, cv2.COLOR_BGR2RGB))

image.png

Contour extraction

contours, _ = cv2.findContours(binarized, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

#Select the one with the largest area
card_cnt = max(contours, key=cv2.contourArea)

#Draw contours on the image
line_color = (0, 255, 0)
thickness = 30
cv2.drawContours(img, [card_cnt], -1, line_color, thickness)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

image.png

Projection transformation

Based on the contour information captured above, projective transformation (angle correction) will be performed.

#Approximate the contour with a convex shape
#A fixed value of 0 for the total length of the contour.Multiplying a factor of 1 is enough
#Coefficient tuning seems to be almost unnecessary on the premise that the card is copied properly to some extent.(May be necessary for OCR adjustment)
epsilon = 0.1 * cv2.arcLength(card_cnt, True)
approx = cv2.approxPolyDP(card_cnt, epsilon, True)

#Card width(Since the image has a vertical card, the width and height are reversed during projective transformation.)
card_img_width = 2400 #Appropriate value
card_img_height = round(card_img_width * (5.4 / 8.56)) #License ration(=nanaco ratio)Calculated by dividing by

src = np.float32(list(map(lambda x: x[0], approx)))
dst = np.float32([[0,0],[0,card_img_width],[card_img_height,card_img_width],[card_img_height,0]])

projectMatrix = cv2.getPerspectiveTransform(src, dst)

#Since the line was overwritten earlier, get the image again
img = cv2.imread('../images/nanaco_skew.jpeg')
transformed = cv2.warpPerspective(img, projectMatrix, (card_img_height, card_img_width))
plt.imshow(cv2.cvtColor(transformed, cv2.COLOR_BGR2RGB))

image.png

**did it! !! ** The slanted letters are now straight!

bonus

After this, I'm thinking of trying to read the contents using OCR using an actual driver's license, but I also tried a little with the current nanaco. However, although it is necessary to put restrictions on the part to be read, it is roughly done.

Using the image of nanaco_in_hand.jpeg, I applied OCR to the last obtained image using pyocr for the entire image. You can get this by running the same script as above for nanaco_in_hand.jpeg (slightly diagonal ...) image.png

For this image, I tried to convert it to text using pyocr + tesseract as per the tutorial. [^ ocr]

Result is...

Within this plan for using the nanaco card
For details on the use of the chin card, please refer to the member agreement. Five
Ako's card is a member store with the nanaco mark on the right, and you can use the electronic money and the electronic manager in the card.
You will be able to confirm your balance.
Do not bend the card, give it a great impact, or leave it at high temperature or when it is magnetized.
Ako's card and the electronic money in the card are not cashable.
The upper limit of the charge for Ako's card is 50,000 yen.
Ako's card can only be used by the member who has approved the member agreement and has signed the member office name field.
The ownership of Ako's card belongs to the stock company Seven Card Service, and it is not possible to lend or transfer it to another person.

Is it a place that is sloppy in spite of doing it roughly? I will improve the accuracy of this area with a driver's license and continue. (It's quite interesting that "●" is recognized as "A")

[^ binarization]: RGB image with information of each point (256x256x256) → Converts the binary image of binary information of each point (2 = 1/0). This is a process to make contour extraction easier. [^ opencv]: ʻopencv-contrib-python imports the contrib module along with ʻopen-python. ʻOpencv-python seems to be okay with cv2`. However, as far as the official document is seen, it seems that it is better to use this notation for new items. (Reference: https://pypi.org/project/opencv-python/) [^ ocr]: This article was also helpful! Installation such as tesseract is required.

Recommended Posts

Angle correction (projection conversion) of the license using OpenCV-Automatically determine the binarization threshold-
Determine the number of classes using the Starges formula
Determine the threshold using the P tile method in python