Introduction

Let's extract information from business cards by combining Cloud Vision API and Natural Language API. I will make it using Python to use the API.

Click here for the previous article:

Let's make a business card management app with kintone

App to make from now on

If you give a business card image, we will create an application that extracts the name, company name, and address. The image looks like this: スクリーンショット 2016-11-10 9.52.44.png

Flow of application creation

Step 0: Prepare to create an app ↓ Step 1: Detect text using Vision API ↓ Step 2: Extract names and company names using Natural Language API ↓ Step 3: Integrate the two APIs to extract information from business cards

Step0 (3min) Prepare to create an app

To create an app, install the necessary libraries, download the repository, and set the API key.

Install the required libraries

Install the libraries needed to create the app. Execute the following command to install.

$ pip install requests
$ pip install pyyaml

Download repository

I created the template of the application in advance. If you fill in the necessary parts, it will work, so please download the repository for that from the following.

Download

You can download it from "Download ZIP" of "Clone or Download".

Set API Key

Describe the API key obtained by Google Cloud Platform in the configuration file (plugins / config / google.yaml). Open google.yaml in an editor and overwrite the token value with your Google Cloud Platform API key value. Please rewrite the part of ** xxx **.

`google.yaml`


token: xxx

Step1 (5min) Detect text using Vision API

This section first describes the Vision API. After that, write a script to use the API. After writing, execute the script to check the operation. Let's start with the API description.

What is Vision API?

The Google Cloud Vision API leverages the power of powerful machine learning models to enable the development of applications that can recognize and understand the content of images. The Cloud Vision API has the following features:

Image classification (eg "yacht" "lion" "Eiffel Tower")
Face recognition
Character recognition
Logo detection
Landmark detection
Safe Search detection

It's hard to understand with words alone, so let's try it. Visit the web page below to try out the demo.

Google Cloud Vision API

Although it is a billing system, you can use it for free up to 1000 requests. After that, you will be charged according to the number of requests.

Script description

Write a script to use the Vision API. The location of the script is ** plugins / apis / vision.py **. Open vision.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.

`vision.py`


# -*- coding: utf-8 -*-
import base64
import requests


def detect_text(image_file, access_token=None):

    with open(image_file, 'rb') as image:
        base64_image = base64.b64encode(image.read()).decode()

    url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(access_token)
    header = {'Content-Type': 'application/json'}
    body = {
        'requests': [{
            'image': {
                'content': base64_image,
            },
            'features': [{
                'type': 'TEXT_DETECTION',
                'maxResults': 1,
            }]

        }]
    }
    response = requests.post(url, headers=header, json=body).json()
    text = response['responses'][0]['textAnnotations'][0]['description'] if len(response['responses'][0]) > 0 else ''
    return text

By giving the path and API key of the image file to ** detect_text **, the text in the image file will be detected, extracted and returned. Let's run it to check the operation.

Script execution

Let's run the script we wrote earlier. First move to the ** plugins / tests ** folder. Can you see that test_vision.py is in it? In test_vision.py, it is specified to call ** detect_text ** of the script written earlier. In other words, if it works correctly, it should return the text in the image when you give it.

Let's do it. Give example.png in the data folder and execute.

$ python test_vision.py data/example.png > result.txt

Did you get the following string as a result of execution?

Kintone Co., Ltd.
Tokyo Headquarters First Sales Department
Cai Mao Zutaro
23-4567
1 Nihonbashi Tianzhu-cho, Tokyo-2-3
Tel: 00-1234-5678
E-mail: [email protected]
Righteousness

Step2 (5min) Extract names and company names using Natural Language API

First, let's talk about the Natural Language API. After that, write a script to use the API. After writing, execute the script to check the operation. Let's start with the API description.

What is the Natural Language API?

The Google Cloud Natural Language API is an easy-to-use REST API that applies a powerful machine learning model to recognize the structure and meaning of text. The Natural Language API has the following features:

Information extraction (personal name, organization name, event information, etc.)
Evaluation analysis (feelings of comments on products, consumer opinions, etc.)
Parsing

It's hard to understand with words alone, so let's try it. Visit the web page below to try out the demo.

Google Cloud Natural Language API

Although it is a billing system, you can use it for free up to 5000 requests. After that, you will be charged according to the number of requests.

Script description

Write a script to use the Natural Language API. The location of the script is ** plugins / apis / language.py **. Open language.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.

`language.py`


# -*- coding: utf-8 -*-
import requests


def extract_entities(text, access_token=None):

    url = 'https://language.googleapis.com/v1beta1/documents:analyzeEntities?key={}'.format(access_token)
    header = {'Content-Type': 'application/json'}
    body = {
        "document": {
            "type": "PLAIN_TEXT",
            "language": "JA",
            "content": text
        },
        "encodingType": "UTF8"
    }
    response = requests.post(url, headers=header, json=body).json()
    return response


def extract_required_entities(text, access_token=None):
    entities = extract_entities(text, access_token)
    required_entities = {'ORGANIZATION': '', 'PERSON': '', 'LOCATION': ''}
    for entity in entities['entities']:
        t = entity['type']
        if t in required_entities:
            required_entities[t] += entity['name']

    return required_entities

By giving ** extract_entities ** text and API key, you can extract various named entities. However, this time, only ** company name **, ** personal name **, and ** location ** are extracted from the text. ** extract_required_entities ** is used to extract this information.

Let's run it to check the operation.

Script execution

Let's run the script we wrote earlier. First move to the ** plugins / tests ** folder. Can you see that test_language.py is in it? test_language.py specifies to call ** extract_required_entities ** of the script I wrote earlier. In other words, if it works correctly, when you give a text, it should return the company name, person's name, and location in the text.

Let's do it. Give example.txt in the data folder and execute. The character recognition result mentioned earlier is included in example.txt.

$ python test_language.py data/example.txt > result.txt

Did you get the following string as a result of execution?

{'PERSON': 'Cai Mao Zutaro', 'LOCATION': '1 Nihonbashi Tianzhu-cho, Tokyo-2-3', 'ORGANIZATION': 'Kintone Co., Ltd.'}

Step3 (3min) Integrate two APIs to extract information from business cards

Finally, I will write a script to combine the Vision API and Natural Language API that I have written so far. After writing, execute the script to check the operation. Let's start by writing the script.

Script description

Write a script to combine the Vision API and the Natural Language API. The location of the script is ** plugins / apis / integration.py **. Open integration.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.

`integration.py`


# -*- coding: utf-8 -*-
from .language import extract_required_entities
from .vision import detect_text


def extract_entities_from_img(img_path, access_token):

    text = detect_text(img_path, access_token)
    entities = extract_required_entities(text, access_token)

    return entities

By giving the path and API key of the image file to ** extract_entities_from_img **, it will recognize and return the company name and person name in the image file. Let's run it to check the operation.

Script execution

Now let's run the script. First move to the ** plugins / tests ** folder. Can you see that test_integration.py is in it? test_integration.py specifies to call ** extract_entities_from_img ** of the script I wrote earlier. In other words, if it works correctly, when you give an image, it should return the company name etc. in the image.

Let's do it. Give example.png in the data folder and execute.

$ python test_integration.py data/example.png > result.txt

Did you get the following string as a result of execution?

{'PERSON': 'Cai Mao Zutaro', 'LOCATION': '1 Nihonbashi Tianzhu-cho, Tokyo-2-3', 'ORGANIZATION': 'Kintone Co., Ltd.'}

in conclusion

How was that. I tried to deal with development using the Vision API and the Natural Language API, but I think you could feel various possibilities by connecting the two.

Next, combine the apps you have created so far with Slack so that you can upload business card images to Slack and register them in kintone. Please proceed to the next article.

Create a SlackBot that extracts information from business cards

Article summary

The contents of this hands-on are summarized in three articles.

Please prepare in advance from the following article.

Kintone x Easy business card management realized by machine learning @kintone Café

[PYTHON] Extract information from business cards by combining Vision API and Natural Language API

Introduction

App to make from now on

Flow of application creation

Step0 (3min) Prepare to create an app

Install the required libraries

Download repository

Set API Key

google.yaml

Step1 (5min) Detect text using Vision API

What is Vision API?

Script description

vision.py

Script execution

Step2 (5min) Extract names and company names using Natural Language API

What is the Natural Language API?

Script description

language.py

Script execution

Step3 (3min) Integrate two APIs to extract information from business cards

Script description

integration.py

Script execution

in conclusion

Article summary

`google.yaml`

`vision.py`

`language.py`

`integration.py`