[PYTHON] Try using GCP Handwriting Recognition (OCR)

Overview

We have summarized how to recognize handwritten characters using the optical character recognition (OCR) function that can be used with GCP (Google Cloud platform). For GCP beginners and those who want to use GCP in the future.

Introduction

Target

The goal is to recognize handwritten characters in the image using the OCR function of GCP.

Execution environment

macOS Catalina 10.15.6 Python 3.8.1

table of contents

――Before you start --Let's prepare input data --Implementation --Execute

Before you start

You need to create a Google account to use each GCP service. If you do not have a Google account, please refer to here to create a Google account.

After creating a Google account, go to GCP Console and [here](https://cloud.google.com/vision/docs/before-you- Please refer to begin) to set the Cloud project and authentication information.

Let's prepare the input data

Before starting the implementation, first prepare the handwritten image you want to recognize. I prepared such an image.

Now, implementation

[Tutorial](https://cloud.google.com/vision/docs/handwriting?apix_params=%7B%22alt%22%3A%22json%22%2C%22%24.xgafv%22%3A%221%22 % 2C% 22prettyPrint% 22% 3Atrue% 2C% 22resource% 22% 3A% 7B% 7D% 7D # vision-document-text-detection-python) to create the code. The created code is as follows. The file name is detect.py.

import os
import io

from google.cloud import vision

def detect_document(path):
    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))


if __name__ == "__main__":
    path = 'sample.png'
    detect_document(os.path.abspath(path))

Run

The execution command is as follows.

python3 detect.py

Execution result


Block confidence: 0.8999999761581421

Paragraph confidence: 0.8999999761581421
Word text:I(confidence: 0.9800000190734863)
	Symbol:I(confidence: 0.9800000190734863)
Word text:of(confidence: 0.9900000095367432)
	Symbol:of(confidence: 0.9900000095367432)
Word text:name(confidence: 0.9300000071525574)
	Symbol:Name(confidence: 0.8600000143051147)
	Symbol:Before(confidence: 1.0)
Word text:Is(confidence: 0.9900000095367432)
	Symbol:Is(confidence: 0.9900000095367432)
Word text: KOTARO (confidence: 0.8299999833106995)
	Symbol: K (confidence: 0.4099999964237213)
	Symbol: O (confidence: 0.8299999833106995)
	Symbol: T (confidence: 0.8600000143051147)
	Symbol: A (confidence: 0.9900000095367432)
	Symbol: R (confidence: 0.9900000095367432)
	Symbol: O (confidence: 0.949999988079071)
Word text:is(confidence: 0.9399999976158142)
	Symbol:so(confidence: 0.9399999976158142)
	Symbol:Su(confidence: 0.949999988079071)
Word text: 。 (confidence: 0.9900000095367432)
	Symbol: 。 (confidence: 0.9900000095367432)

Block confidence: 0.9200000166893005

Paragraph confidence: 0.9200000166893005
Word text:of(confidence: 0.9200000166893005)
	Symbol:of(confidence: 0.9200000166893005)

Block confidence: 0.9300000071525574

Paragraph confidence: 0.9300000071525574
Word text: Python (confidence: 0.9700000286102295)
	Symbol: P (confidence: 0.9800000190734863)
	Symbol: y (confidence: 0.9800000190734863)
	Symbol: t (confidence: 0.9100000262260437)
	Symbol: h (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: n (confidence: 0.9900000095367432)
Word text:But(confidence: 0.9700000286102295)
	Symbol:But(confidence: 0.9700000286102295)
Word text:Like(confidence: 0.8999999761581421)
	Symbol:Good(confidence: 0.9399999976158142)
	Symbol:Ki(confidence: 0.8600000143051147)
Word text:is(confidence: 0.8500000238418579)
	Symbol:so(confidence: 0.7799999713897705)
	Symbol:Su(confidence: 0.9300000071525574)
Word text: 。 (confidence: 0.8799999952316284)
	Symbol: 。 (confidence: 0.8799999952316284)

Block confidence: 0.949999988079071

Paragraph confidence: 0.949999988079071
Word text:Everyone(confidence: 0.9900000095367432)
	Symbol:Mi(confidence: 0.9900000095367432)
	Symbol:Hmm(confidence: 1.0)
	Symbol:Nana(confidence: 1.0)
Word text: 、 (confidence: 0.699999988079071)
	Symbol: 、 (confidence: 0.699999988079071)
Word text:Follow(confidence: 0.9300000071525574)
	Symbol:Fu(confidence: 0.8899999856948853)
	Symbol:Oh(confidence: 0.9200000166893005)
	Symbol:B(confidence: 0.9399999976158142)
	Symbol:-(confidence: 1.0)
Word text:Shi(confidence: 1.0)
	Symbol:Shi(confidence: 1.0)
Word text:hand(confidence: 1.0)
	Symbol:hand(confidence: 1.0)
Word text:Ne(confidence: 0.9900000095367432)
	Symbol:Ne(confidence: 0.9900000095367432)
Word text: 。 (confidence: 0.9900000095367432)
	Symbol: 。 (confidence: 0.9900000095367432)
python3 detect.py  0.82s user 0.42s system 2% cpu 57.861 total

The size of the image file was 8.7MB and the execution time was 0.82s. I found that it was considerably more accurate than the model I trained. As expected, Google in the world. .. ..

Code commentary

Let's take a brief look at the code inside the detect_document method.

    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

Authentication and image acquisition are performed in this part. If the authentication settings are not set properly, an error will occur in the first line. Next is the recognition part.

response = client.document_text_detection(image=image)

This is the only line that is actually recognizing. The result of applying the image specified in image to the recognition of the model trained by Google in advance is returned in response.

for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

The result is displayed in this part. A block is a collection of words, and you can access the confidence of the entire block in block.confidence. Block.paragraphs for what is recognized as a sentence (paragraph) in a block, block.words for what is recognized as a word in a paragraph, and block.symbols for each character (symbols) in a word. You can access it.

If you want to do something with the recognition results, you should get a way to access each recognition result from this part.

Impressions

As expected, it was about accuracy and processing speed. I also wanted to touch various other things.

Thank you for watching until the end. I am still an inexperienced person, so please do not hesitate to contact me if you have any suggestions or questions regarding the article.

Recommended Posts

Try using GCP Handwriting Recognition (OCR)
Perform handwriting recognition using Pylearn2
Interactive handwriting recognition app using pygame
Handwriting recognition using KNN in Python
Try real-time object recognition using YOLOv2 (TensorFlow)
Try using Tkinter
Try using docker-py
Try using PDFMiner
Try using geopandas
Try using Selenium
Try using scipy
Try using django-swiftbrowser
Try using matplotlib
Try using tf.metrics
Try using PyODE
Try using virtualenv (virtualenvwrapper)
[Azure] Try using Azure Functions
Try using virtualenv now
Try using W & B
Try using Django templates.html
[Kaggle] Try using LGBM
Try using Python's feedparser.
Try using Python's Tkinter
Try using Tweepy [Python2.7]
Try using Pytorch's collate_fn
I tried handwriting recognition of runes with CNN using Keras
Try using PythonTex with Texpad.
[Python] Try using Tkinter's canvas
Try using Jupyter's Docker image
Try using scikit-learn (1) --K-means clustering
Try function optimization using Hyperopt
Try using Azure Logic Apps
Age recognition using Pepper's API
I tried face recognition using Face ++
[Kaggle] Try using xg boost
Try using the Twitter API
Try using OpenCV on Windows
Try using Jupyter Notebook dynamically
Try tweeting automatically using Selenium.
Try using the Twitter API
Try using SQLAlchemy + MySQL (Part 2)
Try using Django's template feature
Try using the PeeringDB 2.0 API
Try using Pelican's draft feature
Try using pytest-Overview and Samples-
Try face recognition with Python
Try using folium with anaconda