[PYTHON] Extract information from business cards by combining Vision API and Natural Language API

Introduction

Let's extract information from business cards by combining Cloud Vision API and Natural Language API. I will make it using Python to use the API.

Click here for the previous article:

App to make from now on

If you give a business card image, we will create an application that extracts the name, company name, and address. The image looks like this: スクリーンショット 2016-11-10 9.52.44.png

Flow of application creation

Step 0: Prepare to create an app ↓ Step 1: Detect text using Vision API ↓ Step 2: Extract names and company names using Natural Language API ↓ Step 3: Integrate the two APIs to extract information from business cards

Step0 (3min) Prepare to create an app

To create an app, install the necessary libraries, download the repository, and set the API key.

Install the required libraries

Install the libraries needed to create the app. Execute the following command to install.

$ pip install requests
$ pip install pyyaml

Download repository

I created the template of the application in advance. If you fill in the necessary parts, it will work, so please download the repository for that from the following.

You can download it from "Download ZIP" of "Clone or Download".

Set API Key

Describe the API key obtained by Google Cloud Platform in the configuration file (plugins / config / google.yaml). Open google.yaml in an editor and overwrite the token value with your Google Cloud Platform API key value. Please rewrite the part of ** xxx **.

google.yaml


token: xxx

Step1 (5min) Detect text using Vision API

This section first describes the Vision API. After that, write a script to use the API. After writing, execute the script to check the operation. Let's start with the API description.

What is Vision API?

The Google Cloud Vision API leverages the power of powerful machine learning models to enable the development of applications that can recognize and understand the content of images. The Cloud Vision API has the following features:

It's hard to understand with words alone, so let's try it. Visit the web page below to try out the demo.

Although it is a billing system, you can use it for free up to 1000 requests. After that, you will be charged according to the number of requests.

Script description

Write a script to use the Vision API. The location of the script is ** plugins / apis / vision.py **. Open vision.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.

vision.py


# -*- coding: utf-8 -*-
import base64
import requests


def detect_text(image_file, access_token=None):

    with open(image_file, 'rb') as image:
        base64_image = base64.b64encode(image.read()).decode()

    url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(access_token)
    header = {'Content-Type': 'application/json'}
    body = {
        'requests': [{
            'image': {
                'content': base64_image,
            },
            'features': [{
                'type': 'TEXT_DETECTION',
                'maxResults': 1,
            }]

        }]
    }
    response = requests.post(url, headers=header, json=body).json()
    text = response['responses'][0]['textAnnotations'][0]['description'] if len(response['responses'][0]) > 0 else ''
    return text

By giving the path and API key of the image file to ** detect_text **, the text in the image file will be detected, extracted and returned. Let's run it to check the operation.

Script execution

Let's run the script we wrote earlier. First move to the ** plugins / tests ** folder. Can you see that test_vision.py is in it? In test_vision.py, it is specified to call ** detect_text ** of the script written earlier. In other words, if it works correctly, it should return the text in the image when you give it.

Let's do it. Give example.png in the data folder and execute. example.png

$ python test_vision.py data/example.png > result.txt

Did you get the following string as a result of execution?

Kintone Co., Ltd.
Tokyo Headquarters First Sales Department
Cai Mao Zutaro
23-4567
1 Nihonbashi Tianzhu-cho, Tokyo-2-3
Tel: 00-1234-5678
E-mail: [email protected]
Righteousness

Step2 (5min) Extract names and company names using Natural Language API

First, let's talk about the Natural Language API. After that, write a script to use the API. After writing, execute the script to check the operation. Let's start with the API description.

What is the Natural Language API?

The Google Cloud Natural Language API is an easy-to-use REST API that applies a powerful machine learning model to recognize the structure and meaning of text. The Natural Language API has the following features:

It's hard to understand with words alone, so let's try it. Visit the web page below to try out the demo.

Although it is a billing system, you can use it for free up to 5000 requests. After that, you will be charged according to the number of requests.

Script description

Write a script to use the Natural Language API. The location of the script is ** plugins / apis / language.py **. Open language.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.

language.py


# -*- coding: utf-8 -*-
import requests


def extract_entities(text, access_token=None):

    url = 'https://language.googleapis.com/v1beta1/documents:analyzeEntities?key={}'.format(access_token)
    header = {'Content-Type': 'application/json'}
    body = {
        "document": {
            "type": "PLAIN_TEXT",
            "language": "JA",
            "content": text
        },
        "encodingType": "UTF8"
    }
    response = requests.post(url, headers=header, json=body).json()
    return response


def extract_required_entities(text, access_token=None):
    entities = extract_entities(text, access_token)
    required_entities = {'ORGANIZATION': '', 'PERSON': '', 'LOCATION': ''}
    for entity in entities['entities']:
        t = entity['type']
        if t in required_entities:
            required_entities[t] += entity['name']

    return required_entities

By giving ** extract_entities ** text and API key, you can extract various named entities. However, this time, only ** company name **, ** personal name **, and ** location ** are extracted from the text. ** extract_required_entities ** is used to extract this information.

Let's run it to check the operation.

Script execution

Let's run the script we wrote earlier. First move to the ** plugins / tests ** folder. Can you see that test_language.py is in it? test_language.py specifies to call ** extract_required_entities ** of the script I wrote earlier. In other words, if it works correctly, when you give a text, it should return the company name, person's name, and location in the text.

Let's do it. Give example.txt in the data folder and execute. The character recognition result mentioned earlier is included in example.txt.

$ python test_language.py data/example.txt > result.txt

Did you get the following string as a result of execution?

{'PERSON': 'Cai Mao Zutaro', 'LOCATION': '1 Nihonbashi Tianzhu-cho, Tokyo-2-3', 'ORGANIZATION': 'Kintone Co., Ltd.'}

Step3 (3min) Integrate two APIs to extract information from business cards

Finally, I will write a script to combine the Vision API and Natural Language API that I have written so far. After writing, execute the script to check the operation. Let's start by writing the script.

Script description

Write a script to combine the Vision API and the Natural Language API. The location of the script is ** plugins / apis / integration.py **. Open integration.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.

integration.py


# -*- coding: utf-8 -*-
from .language import extract_required_entities
from .vision import detect_text


def extract_entities_from_img(img_path, access_token):

    text = detect_text(img_path, access_token)
    entities = extract_required_entities(text, access_token)

    return entities

By giving the path and API key of the image file to ** extract_entities_from_img **, it will recognize and return the company name and person name in the image file. Let's run it to check the operation.

Script execution

Now let's run the script. First move to the ** plugins / tests ** folder. Can you see that test_integration.py is in it? test_integration.py specifies to call ** extract_entities_from_img ** of the script I wrote earlier. In other words, if it works correctly, when you give an image, it should return the company name etc. in the image.

Let's do it. Give example.png in the data folder and execute. example.png

$ python test_integration.py data/example.png > result.txt

Did you get the following string as a result of execution?

{'PERSON': 'Cai Mao Zutaro', 'LOCATION': '1 Nihonbashi Tianzhu-cho, Tokyo-2-3', 'ORGANIZATION': 'Kintone Co., Ltd.'}

in conclusion

How was that. I tried to deal with development using the Vision API and the Natural Language API, but I think you could feel various possibilities by connecting the two.

Next, combine the apps you have created so far with Slack so that you can upload business card images to Slack and register them in kintone. Please proceed to the next article.

Article summary

The contents of this hands-on are summarized in three articles.

Please prepare in advance from the following article.

Recommended Posts

Extract information from business cards by combining Vision API and Natural Language API
[Natural language processing] Extract keywords from Kakenhi database with MeCab-ipadic-neologd and termextract
[Go language] Use OpenWeatherMap and Twitter API to regularly tweet weather information from Raspberry Pi
Python: Extract file information from shared drive with Google Drive API