You've never used the Vison API while doing image processing. I thought it was amazing, but I didn't do anything about it ..

For the time being, I decided to try it even if it was light, so I wrote it in Python! This article was left as a memo at that time ~

By the way, the registration procedure etc. are done while referring to this area ~

Functions used

The following functions were used this time. I bring the explanation from the official.

** Automatic detection of objects ** The Cloud Vision API allows you to use object localization to detect and extract multiple objects in an image. Object localization identifies the objects in the image and specifies a LocalizedObjectAnnotation for each object. Each LocalizedObjectAnnotation identifies information about the object, the location of the object, and the border of the area in the image where the object is located. Object localization identifies both prominent and less prominent objects in the image.

Source code

It's rough, but please forgive me ... I also wanted the recognized start point coordinates and end point coordinates, so I'm pulling them out with a rough technique. How to check the json key Is this correct? I feel like that.


ENDPOINT_URL = 'https://vision.googleapis.com/v1/images:annotate'
API_KEY = 'API key'

#json keyword
RESPONSES_KEY = 'responses'
LOCALIZED_KEY = 'localizedObjectAnnotations'
BOUNDING_KEY = 'boundingPoly'
NORMALIZED_KEY = 'normalizedVertices'
NAME_KEY = 'name'
X_KEY = 'x'
Y_KEY = 'y'
def get_gcp_info(image):

    image_height, image_width, _ = image.shape
    min_image = image_proc.exc_resize(int(image_width/2), int(image_height/2), image)

    _, enc_image = cv2.imencode(".png ", min_image)
    image_str = enc_image.tostring()
    image_byte = base64.b64encode(image_str).decode("utf-8")

    img_requests = [{
        'image': {'content': image_byte},
        'features': [{
            'type': 'OBJECT_LOCALIZATION',
            'maxResults': 5
        }]
    }]

    response = requests.post(ENDPOINT_URL,
                             data=json.dumps({"requests": img_requests}).encode(),
                             params={'key': API_KEY},
                             headers={'Content-Type': 'application/json'})

    # 'responses'If the key exists
    if RESPONSES_KEY in response.json():
        # 'localizedObjectAnnotations'If the key exists
        if LOCALIZED_KEY in response.json()[RESPONSES_KEY][0]:
            # 'boundingPoly'If the key exists
            if BOUNDING_KEY in response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0]:
                # 'normalizedVertices'If the key exists
                if NORMALIZED_KEY in response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][BOUNDING_KEY]:

                    name = response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][NAME_KEY]

                    start_point, end_point = check_recognition_point(
                        response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][BOUNDING_KEY][NORMALIZED_KEY],
                        image_height,
                        image_width
                    )

                    print(name, start_point, end_point)

                    return True, name, start_point, end_point

    print("non", [0, 0], [0, 0])
    #If there is not enough information
    return False, "non", [0, 0], [0, 0]

def check_recognition_point(point_list_json, image_height, image_width):
    #X start point (%) of recognition coordinates
    x_start_rate = point_list_json[0][X_KEY]
    #Y start point (%) of recognition coordinates
    y_start_rate = point_list_json[0][Y_KEY]
    #X end point (%) of recognition coordinates
    x_end_rate = point_list_json[2][X_KEY]
    #Y end point (%) of recognition coordinates
    y_end_rate = point_list_json[2][Y_KEY]

    x_start_point = int(image_width * x_start_rate)
    y_start_point = int(image_height * y_start_rate)
    x_end_point = int(image_width * x_end_rate)
    y_end_point = int(image_height * y_end_rate)

    return [x_start_point, y_start_point], [x_end_point, y_end_point]

The recognized object name is returned in name, and the coordinates of the recognized object are returned in start_point and end_point.

At the end

I tried through clothes and shoes, but I was able to recognize it properly! (Although the name was pretty rough) It would be interesting to make a model by yourself using AUTOML.

[PYTHON] I tried the Google Cloud Vision API for the first time

Functions used

Source code

At the end