Detect Japanese characters from images using Google's Cloud Vision API in Python

Detect Japanese characters from images using Google's Cloud Vision API in Python

Introduction

Hands-on article to review and fix the knowledge gained by developing Serverless Web application Mosaic / 87b57dfdbcf218de91e2), and I'm planning a total of 17 articles, currently 13 articles and 4 more, but I'm a little tired of it. I'm getting tired of it, or I want to start implementing new features. That's why I started. To add a character detection function.

Character detection? Character recognition? It sounds like OCR (Optical Character Recognition). It seems that characters can be detected even with AWS Rekognition, but unfortunately it seems that Japanese is not supported. (As of January 2020. I haven't tried it, but it seems that it is not supported yet.) However, Google's Cloud Vision API also supports Japanese, so I decided to use this.

Cloud Vision API enabled

So let's do it right away. Access the Google Cloud Platform or Google Developpers console> APIs and services. https://console.cloud.google.com/apis https://console.developers.google.com/apis

Press the "+ Enable APIs and Services" button at the top of the screen. Screenshot 2020-01-11 at 11.55.13.png Screenshot 2020-01-11 at 11.56.22.png

Find and enable the Cloud Vision API. Screenshot 2020-01-11 at 11.56.55.png Screenshot 2020-01-11 at 11.58.07.png

A service account will be added as authentication information, but it will be omitted here because it overlaps with this article.

Use Google Vision API with Lambda (Python3)

For importing the library required to call Google's API with the service account, please refer to this article as before.

The code that passes the local image file to the Vision API and detects faces and text looks like this:

lambda_function.py


  : 
def detectFacesByGoogleVisionAPIFromF(localFilePath, bucket, dirPathOut):
    try:
        keyFile = "service-account-key.json"
        scope = ["https://www.googleapis.com/auth/cloud-vision"]
        api_name = "vision"
        api_version = "v1"
        service = getGoogleService(keyFile, scope, api_name, api_version)
        
        ctxt = None
        with open(localFilePath, 'rb') as f:
            ctxt = b64encode(f.read()).decode()
        
        service_request = service.images().annotate(body={
            "requests": [{
                "image":{
                    "content": ctxt
                  },
                "features": [
                    {
                        "type": "FACE_DETECTION"
                    }, 
                    {
                        "type": "TEXT_DETECTION"
                    }
                ]
            }]
        })
        response = service_request.execute()
        
    except Exception as e:
        logger.exception(e)

def getGoogleService(keyFile, scope, api_name, api_version):
    credentials = ServiceAccountCredentials.from_json_keyfile_name(keyFile, scopes=scope)
    return build(api_name, api_version, credentials=credentials, cache_discovery=False) 

In this sample code, FACE_DETECTION and TEXT_DETECTION are specified. In addition to that, you can specify LABEL_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, etc., but if you specify it, the charge will be added accordingly. Therefore, if the purpose is only character detection, it is better to specify only TEXT_DETECTION. Screenshot 2020-01-12 at 14.58.06.png I won't go into the details of what can be specified for features and the json returned. See the Cloud Vision API docs (https://cloud.google.com/vision/docs?hl=ja). Click here for pricing details (https://cloud.google.com/vision/pricing?hl=ja).

I really wanted to Vision for files uploaded to Google Drive

But I couldn't.

test.py


        imageUri = "https://drive.google.com/uc?id=" + fileID
        service_request = service.images().annotate(body={
            "requests": [{
                "image":{
                    "source":{
                      "imageUri": imageUri
                    }
                  },
                "features": [
                    {
                        "type": "FACE_DETECTION"
                    }, 
                    {
                        "type": "TEXT_DETECTION"
                    }
                ]
            }]
        })
        response = service_request.execute()

I thought it would be cool like this, but I got the following error and tried various things, but I couldn't do it after all.

response.json


{"responses": [{"error": {"code": 7, "message": "We're not allowed to access the URL on your behalf. Please download the content and pass it in."}}]}
{"responses": [{"error": {"code": 4, "message": "We can not access the URL currently. Please download the content and pass it in."}}]}ass it in."}}]}

I can call Vision, but it seems that I can't access the image (imageUri) on the Web from Vision. I couldn't even specify a direct link to the image on the S3 Public bucket. Mystery is. Please let me know.

Operation check

TEXT_DETECTION gives you two elements: textAnnotations and fullTextAnnotation. The result of enclosing the area detected by textAnnotations in blue and the area detected by fullTextAnnotation in green was as follows. faces-aaaaaa.png

I think that the results are reasonable, but I get the impression that the range of one character is slightly different for the blue one, probably because it is in Japanese.

Also, it seems to be weak when each character is independent as shown in the image below. Whether this is also a problem peculiar to Japanese, whether it can be detected if it is a different feature, or whether it can be adjusted with other parameters, I have not pursued it deeply, but I have not obtained the expected result anyway. It is a pain to hope that the API learning will progress in the future and the characters in this image can be detected without omission. faces-IMG_20190901_094815-200126131521-1387a884261.jpg For Akatsuki, where AWS Rekognition supports Japanese for character detection, I'm thinking of evaluating this image first!

Afterword

I know it's important and necessary to put it together as an article, but it's the most fun when I'm creating something new. With that flow, I was able to write this Vision API article smoothly. It would be a hassle to write an article later, so I thought it might be important to write it quickly while I was new to my memory and when I was in high tension.

I'm using a serverless web application called Mosaic that I'm making now, basically AWS infrastructure, but eventually I'd like to build the same one with GCP. By the end of 2020. I think it would be better to build the infrastructure in a unified manner, but for Web API services such as Rekognition and Vision API, either one is fine, or the one that can achieve what you want to do will be selected. Both of them just call the API, so it's not a big deal, and it may be the same for any system after all, but it makes the feeling like a plastic model made by combining parts stronger. I don't think you need to know everything, but I don't think it's good to stick to one too much.

It's great to see how easy it is to use machine-learned, high-precision services, but it's painful to get stuck if it doesn't return the results you expect. However, there is nothing I can do in my future research field, so I can only sit and wait for the results I expect to come out one day.

Recommended Posts

Detect Japanese characters from images using Google's Cloud Vision API in Python
Use Google Cloud Vision API from Python
Load images from URLs using Pillow in Python 3
[Rails] How to detect radical images by analyzing images using Cloud Vision API
Extract characters from images using docomo's character recognition API
Push notifications from Python to Android using Google's API
I tried to automatically collect erotic images from Twitter using GCP's Cloud Vision API
Predict gender from name using Gender API and Pykakasi in Python
Using Cloud Storage from Python3 (Introduction)
Run Ansible from Python using API
Let's touch Google's Vision API from Python for the time being
Mouse operation using Windows API in Python
Try using the Wunderlist API in Python
Tweet using the Twitter API in Python
Get Youtube data in Python using Youtube Data API
Download images from URL list in Python
Python calling Google Cloud Vision API from LINE BOT via AWS Lambda
I tried to extract characters from subtitles (OpenCV: Google Cloud Vision API)
Problems with output results with Google's Cloud Vision API
Text extraction with GCP Cloud Vision API (Python3.6)
Try using the BitFlyer Ligntning API in Python
Get image URL using Flickr API in Python
[WP REST API v2] Upload images in Python
Try using ChatWork API and Qiita API in Python
Try using the DropBox Core API in Python
View images in OpenCV from Python using an external USB camera on your MacBook
Upload JPG file using Google Drive API in Python
Initial settings when using the foursquare API in python
Get LEAD data using Marketo's REST API in Python
OpenVINO using Inference Engine Python API in PC environment
Speech transcription procedure using Python and Google Cloud Speech API
A little bit from Python using the Jenkins API
Generate Word Cloud from case law data in python3
Evernote API in Python
Japanese output in Python
C API in Python 3
Control smart light "Yeelight" from Python without using the cloud
Flow of extracting text in PDF with Cloud Vision API
Eliminate garbled Japanese characters in JSON data acquired by API.
I tried to create API list.csv in Python from swagger.yaml
Get your heart rate from the fitbit API in Python!
Eliminate garbled Japanese characters in Python library matplotlib and NetworkX
Remove one-line comments containing Japanese from source code in Python
Image recognition with API from zero knowledge using AutoML Vision
Try to determine food photos using Google Cloud Vision API
Get Python scripts to run quickly in Cloud Run using responder
Solve the Japanese problem when using the CSV module in Python.
Issue reverse geocoding in Japanese with Python Google Maps API
Hit Mastodon's API in Python
Flatten using Python yield from
Save images using python3 requests
I wrote python in Japanese
Base64 encoding images in Python 3
OCR from PDF in Python
Blender Python API in Houdini (Python 3)
I understand Python in Japanese!
Translate using googletrans in Python
Detect keystrokes in python (tty)
Using Python mode in Processing
Use e-Stat API from Python
Get Japanese synonyms in Python