[PYTHON] Machine Learning x Web App Diagnosis: Recognize CAPTCHA with Cloud Vision API

Last time implemented a multi-layer perceptron using "Chainer" and tried to recognize CAPTCHA images. .. This time, I will try the same thing using Google's image analysis API "Cloud Vision API".

agenda

  1. Implementation code
  2. Try it
  3. Summary
  4. References

0. Implementation code

This time, I created a simple image analysis class for verification. As you can see, it just POSTs a JSON-formatted request defined by the Cloud Vision API.

MyRecognitionImage.py


#!/usr/bin/python
#coding:utf-8
import base64
import json
from requests import Request, Session


#Analyze images with Cloud Vision API
class RecognizeImage():

    def __init__(self):
        return

    #CAPTCHA analysis
    def recognize_captcha(self, str_image_path):
        #Loading CAPTCHA images
        bin_captcha = open(str_image_path, 'rb').read()

        #Encode CAPTCHA images with base64
        str_encode_file = base64.b64encode(bin_captcha)

        #Specify API URL
        str_url = "https://vision.googleapis.com/v1/images:annotate?key="

        #API key obtained in advance
        str_api_key = "XXXXXXXXX"

        # Content-Set Type to JSON
        str_headers = {'Content-Type': 'application/json'}

        #Define the JSON payload according to the Cloud Vision API specifications.
        #To extract the text from the CAPTCHA image, the type is "TEXT"_Set to "DETECTION".
        str_json_data = {
            'requests': [
                {
                    'image': {
                        'content': str_encode_file
                    },
                    'features': [
                        {
                            'type': "TEXT_DETECTION",
                            'maxResults': 10
                        }
                    ]
                }
            ]
        }

        #Send request
        obj_session = Session()
        obj_request = Request("POST",
                              str_url + str_api_key,
                              data=json.dumps(str_json_data),
                              headers=str_headers
                              )
        obj_prepped = obj_session.prepare_request(obj_request)
        obj_response = obj_session.send(obj_prepped,
                                        verify=True,
                                        timeout=60
                                        )

        #Acquisition of analysis results
        if obj_response.status_code == 200:
            print obj_response.text
            return obj_response.text
        else:
            return "error"

The following three points should be noted when using the API.

--Be sure to base64-encode the image to be analyzed. --Obtain an API key in advance to use the API. --Specify an appropriate "type" according to the purpose of analysis.

When the above code is executed, the following request is POSTed.

POST /v1/images:annotate?key=XXXXXXXXX HTTP/1.1
User-Agent: python-requests/2.8.1
Host: vision.googleapis.com
Accept: */*
Content-Type: application/json
Content-Length: 939

{
 "requests":[
  {
   "image":{
    "content": "iVBORw0KGgoAAAANSUhEUgA ・ ・ ・(abridgement)・ ・ ・/EV4ihonpXVAAAAAElFTkSuQmCC"
   },
   "features":[
    {
     "type":"TEXT_DETECTION",
     "maxResults":10
    }
   ]
  }
 ]
}

Specify Base64-encoded image data in "content" and specify the analysis content you want to execute in "type". Since we want to recognize CAPTCHA this time, specify the text extraction "TEXT_DETECTION". In addition to text extraction, the following analysis can be performed.

--Understanding what is reflected in the image --Detect inappropriate content --Analyzing the meaning of the image

For example, if you POST an image of Tokyo Station, you can recognize it as "Tokyo Station", and if you POST an image that you are happy with, you can recognize it as "happy". I would like to try these in the future.

Image analysis requires quite a lot of machine power, so it was a high hurdle to work on as a hobby. However, anyone can easily perform image analysis using this API. What a wonderful API!

1. Try it

Let's use this to recognize CAPTCHAs.

First of all, from now on. captcha0_neg.png

The extracted text is output to "description".

1st analysis result


{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "O l 4.67 9\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 6,
                "y": 1
              },
              {
                "x": 165,
                "y": 1
              },
              {
                "x": 165,
                "y": 35
              },
              {
                "x": 6,
                "y": 35
              }
            ]
          }
        }
      ]
    }
  ]
}

The result is "O l 4.67 9". 0 (zero) is an uppercase letter O "O", 1 (ichi) is a lowercase letter "l", and there are strange dots, but it can be seen that they are generally recognized correctly. It can be said that the correct answer rate is 100%.

Next is the second one. captcha1_neg.png

This has been disappointing in past validations, but what about the Cloud Vision API?

Second analysis result


{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "496'0,\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 6,
                "y": 10
              },
              {
                "x": 148,
                "y": 10
              },
              {
                "x": 148,
                "y": 70
              },
              {
                "x": 6,
                "y": 70
              }
            ]
          }
        }
      ]
    }
  ]
}

The output order of the text is slightly changed, but "4", "0", "9", and "6" can be recognized. The recognition rate was 50% last time, so it can be seen that it has improved.

This is the last. cp3.png

Third analysis result


{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "425970\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 5,
                "y": 7
              },
              {
                "x": 97,
                "y": 7
              },
              {
                "x": 97,
                "y": 33
              },
              {
                "x": 5,
                "y": 33
              }
            ]
          }
        }
      ]
    }
  ]
}

well done!! You can see that everything can be recognized accurately.

It is possible to extract text with fairly high accuracy, probably because it is learning based on the huge amount of image data held by Google. This could be used for CAPTCHA recognition.

By the way, in the new CAPTCHA "reCAPTCHA" developed by Google, the same animal / thing is selected from multiple images as shown below. It seems that it distinguishes between humans and bots.

reCAPTCHA.png

In this example, it is necessary to select all the same images as the top image (cat), so the correct answer is to select the first from the top left and the second and third from the bottom left.

By the way, I confirmed that the Cloud Vision API can be used to accurately distinguish between cats and dogs.

2. Summary

I tried to recognize CAPTCHA using Cloud Vision API. There are some improvements, but the results are good enough to be used for CAPTCHA recognition.

We also found that it is possible to break through not only simple CAPTCHAs (numerical images, etc.) but also advanced ones such as reCAPTCHAs. This API will be charged after the trial period expires, but considering writing code by yourself and preparing a machine with high specifications, we think that the cost performance is high.

In the future, after repeated verification, I would like to use it for the CAPTCHA recognition engine of the automatic crawler of the Web application.

3. References

  1. Google Cloud Vision API

that's all

Recommended Posts

Machine Learning x Web App Diagnosis: Recognize CAPTCHA with Cloud Vision API
Machine Learning x Web App Diagnosis: Recognizing CAPTCHA with Multilayer Perceptron (Chainer Edition)
Easy machine learning with scikit-learn and flask ✕ Web app
Problems with output results with Google's Cloud Vision API
Text extraction with GCP Cloud Vision API (Python3.6)
Create a typed web app with Python's web framework "Fast API" and TypeScript / OpenAPI-Technology stack for machine learning web apps
Run a machine learning pipeline with Cloud Dataflow (Python)
Deploy a real-time web app with swampdragon x apache
Flow of extracting text in PDF with Cloud Vision API
Create a machine learning app with ABEJA Platform + LINE Bot
Easy deep learning web app with NNC and Python + Flask
Machine learning with Python! Preparation
Web API with Python + Falcon
Machine learning Minesweeper with PyTorch
Beginning with Python machine learning
Try machine learning with Kaggle
How to create a serverless machine learning API with AWS Lambda