[PYTHON] I tried the Google Cloud Vision API for the first time

You've never used the Vison API while doing image processing. I thought it was amazing, but I didn't do anything about it ..

For the time being, I decided to try it even if it was light, so I wrote it in Python! This article was left as a memo at that time ~

By the way, the registration procedure etc. are done while referring to this area ~

Vision API Client Library

Functions used

The following functions were used this time. I bring the explanation from the official.

** Automatic detection of objects ** The Cloud Vision API allows you to use object localization to detect and extract multiple objects in an image. Object localization identifies the objects in the image and specifies a LocalizedObjectAnnotation for each object. Each LocalizedObjectAnnotation identifies information about the object, the location of the object, and the border of the area in the image where the object is located. Object localization identifies both prominent and less prominent objects in the image.

Source code

It's rough, but please forgive me ... I also wanted the recognized start point coordinates and end point coordinates, so I'm pulling them out with a rough technique. How to check the json key Is this correct? I feel like that.


ENDPOINT_URL = 'https://vision.googleapis.com/v1/images:annotate'
API_KEY = 'API key'

#json keyword
RESPONSES_KEY = 'responses'
LOCALIZED_KEY = 'localizedObjectAnnotations'
BOUNDING_KEY = 'boundingPoly'
NORMALIZED_KEY = 'normalizedVertices'
NAME_KEY = 'name'
X_KEY = 'x'
Y_KEY = 'y'
def get_gcp_info(image):

    image_height, image_width, _ = image.shape
    min_image = image_proc.exc_resize(int(image_width/2), int(image_height/2), image)

    _, enc_image = cv2.imencode(".png ", min_image)
    image_str = enc_image.tostring()
    image_byte = base64.b64encode(image_str).decode("utf-8")

    img_requests = [{
        'image': {'content': image_byte},
        'features': [{
            'type': 'OBJECT_LOCALIZATION',
            'maxResults': 5
        }]
    }]

    response = requests.post(ENDPOINT_URL,
                             data=json.dumps({"requests": img_requests}).encode(),
                             params={'key': API_KEY},
                             headers={'Content-Type': 'application/json'})

    # 'responses'If the key exists
    if RESPONSES_KEY in response.json():
        # 'localizedObjectAnnotations'If the key exists
        if LOCALIZED_KEY in response.json()[RESPONSES_KEY][0]:
            # 'boundingPoly'If the key exists
            if BOUNDING_KEY in response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0]:
                # 'normalizedVertices'If the key exists
                if NORMALIZED_KEY in response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][BOUNDING_KEY]:

                    name = response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][NAME_KEY]

                    start_point, end_point = check_recognition_point(
                        response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][BOUNDING_KEY][NORMALIZED_KEY],
                        image_height,
                        image_width
                    )

                    print(name, start_point, end_point)

                    return True, name, start_point, end_point

    print("non", [0, 0], [0, 0])
    #If there is not enough information
    return False, "non", [0, 0], [0, 0]

def check_recognition_point(point_list_json, image_height, image_width):
    #X start point (%) of recognition coordinates
    x_start_rate = point_list_json[0][X_KEY]
    #Y start point (%) of recognition coordinates
    y_start_rate = point_list_json[0][Y_KEY]
    #X end point (%) of recognition coordinates
    x_end_rate = point_list_json[2][X_KEY]
    #Y end point (%) of recognition coordinates
    y_end_rate = point_list_json[2][Y_KEY]

    x_start_point = int(image_width * x_start_rate)
    y_start_point = int(image_height * y_start_rate)
    x_end_point = int(image_width * x_end_rate)
    y_end_point = int(image_height * y_end_rate)

    return [x_start_point, y_start_point], [x_end_point, y_end_point]

The recognized object name is returned in name, and the coordinates of the recognized object are returned in start_point and end_point.

At the end

I tried through clothes and shoes, but I was able to recognize it properly! (Although the name was pretty rough) It would be interesting to make a model by yourself using AUTOML.

Recommended Posts

I tried the Google Cloud Vision API for the first time
I tried using scrapy for the first time
I tried python programming for the first time.
I tried Mind Meld for the first time
I tried Python on Mac for the first time.
I tried python on heroku for the first time
AI Gaming I tried it for the first time
I tried "License OCR" with Google Vision API
I tried "Receipt OCR" with Google Vision API
I tried logistic regression analysis for the first time using Titanic data
I tried to extract characters from subtitles (OpenCV: Google Cloud Vision API)
Kaggle for the first time (kaggle ①)
Kaguru for the first time
What I got into Python for the first time
I tried the Naro novel API 2
I tried running PIFuHD on Windows for the time being
[For self-learning] Go2 for the first time
See python for the first time
I tried the Naruro novel API
Start Django for the first time
I tried using the checkio API
[For beginners] I tried using the Tensorflow Object Detection API
Until you try the Google Cloud Vision API (harmful image detection)
Let's touch Google's Vision API from Python for the time being
Use Google Cloud Vision API from Python
MongoDB for the first time in Python
I tried to touch the COTOHA API
Let's try Linux for the first time
I tried using the BigQuery Storage API
I tried to create serverless batch processing for the first time with DynamoDB and Step Functions
For the first time in Numpy, I will update it from time to time
Since I'm free, the front-end engineer tried Python (v3.7.5) for the first time.
I checked the library for using the Gracenote API
[Python] I tried substituting the function name for the function name
How to use MkDocs for the first time
I tried hitting the Qiita API from go
vprof --I tried using the profiler for Python
Before the coronavirus, I first tried SARS analysis
I tried to touch the API of ebay
How to use the Google Cloud Translation API
Try posting to Qiita for the first time
When I tried using Microsoft's Computer Vision API, I recognized the Galapagos sign "Stop"
Looking back on the machine learning competition that I worked on for the first time
GTUG Girls + PyLadiesTokyo Meetup I went to machine learning for the first time
I tried the MNIST tutorial for beginners of tensorflow.
Register a task in cron for the first time
I tried using the API of the salmon data project
I will install Arch Linux for the time being.
I tried hitting the API with echonest's python client
I tried to automatically collect erotic images from Twitter using GCP's Cloud Vision API
I want to create a lunch database [EP1] Django study for the first time
I want to create a lunch database [EP1-4] Django study for the first time
I tried porting the code written for TensorFlow to Theano
I checked the Python package pre-installed in Google Cloud Dataflow
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I want to move selenium for the time being [for mac]
I tried Google Sign-In with Spring Boot + Spring Security REST API
Summary of stumbling blocks in Django for the first time
[For those who want to use TPU] I tried using the Tensorflow Object Detection API 2
I touched the Qiita API
Introducing yourself at Qiita for the first time (test post)