Let's extract information from business cards by combining Cloud Vision API and Natural Language API. I will make it using Python to use the API.
Click here for the previous article:
If you give a business card image, we will create an application that extracts the name, company name, and address. The image looks like this:
Step 0: Prepare to create an app ↓ Step 1: Detect text using Vision API ↓ Step 2: Extract names and company names using Natural Language API ↓ Step 3: Integrate the two APIs to extract information from business cards
To create an app, install the necessary libraries, download the repository, and set the API key.
Install the libraries needed to create the app. Execute the following command to install.
$ pip install requests
$ pip install pyyaml
I created the template of the application in advance. If you fill in the necessary parts, it will work, so please download the repository for that from the following.
You can download it from "Download ZIP" of "Clone or Download".
Describe the API key obtained by Google Cloud Platform in the configuration file (plugins / config / google.yaml). Open google.yaml in an editor and overwrite the token value with your Google Cloud Platform API key value. Please rewrite the part of ** xxx **.
google.yaml
token: xxx
This section first describes the Vision API. After that, write a script to use the API. After writing, execute the script to check the operation. Let's start with the API description.
The Google Cloud Vision API leverages the power of powerful machine learning models to enable the development of applications that can recognize and understand the content of images. The Cloud Vision API has the following features:
It's hard to understand with words alone, so let's try it. Visit the web page below to try out the demo.
Although it is a billing system, you can use it for free up to 1000 requests. After that, you will be charged according to the number of requests.
Write a script to use the Vision API. The location of the script is ** plugins / apis / vision.py **. Open vision.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.
vision.py
# -*- coding: utf-8 -*-
import base64
import requests
def detect_text(image_file, access_token=None):
with open(image_file, 'rb') as image:
base64_image = base64.b64encode(image.read()).decode()
url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(access_token)
header = {'Content-Type': 'application/json'}
body = {
'requests': [{
'image': {
'content': base64_image,
},
'features': [{
'type': 'TEXT_DETECTION',
'maxResults': 1,
}]
}]
}
response = requests.post(url, headers=header, json=body).json()
text = response['responses'][0]['textAnnotations'][0]['description'] if len(response['responses'][0]) > 0 else ''
return text
By giving the path and API key of the image file to ** detect_text **, the text in the image file will be detected, extracted and returned. Let's run it to check the operation.
Let's run the script we wrote earlier. First move to the ** plugins / tests ** folder. Can you see that test_vision.py is in it? In test_vision.py, it is specified to call ** detect_text ** of the script written earlier. In other words, if it works correctly, it should return the text in the image when you give it.
Let's do it. Give example.png in the data folder and execute.
$ python test_vision.py data/example.png > result.txt
Did you get the following string as a result of execution?
Kintone Co., Ltd.
Tokyo Headquarters First Sales Department
Cai Mao Zutaro
23-4567
1 Nihonbashi Tianzhu-cho, Tokyo-2-3
Tel: 00-1234-5678
E-mail: [email protected]
Righteousness
First, let's talk about the Natural Language API. After that, write a script to use the API. After writing, execute the script to check the operation. Let's start with the API description.
The Google Cloud Natural Language API is an easy-to-use REST API that applies a powerful machine learning model to recognize the structure and meaning of text. The Natural Language API has the following features:
It's hard to understand with words alone, so let's try it. Visit the web page below to try out the demo.
Although it is a billing system, you can use it for free up to 5000 requests. After that, you will be charged according to the number of requests.
Write a script to use the Natural Language API. The location of the script is ** plugins / apis / language.py **. Open language.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.
language.py
# -*- coding: utf-8 -*-
import requests
def extract_entities(text, access_token=None):
url = 'https://language.googleapis.com/v1beta1/documents:analyzeEntities?key={}'.format(access_token)
header = {'Content-Type': 'application/json'}
body = {
"document": {
"type": "PLAIN_TEXT",
"language": "JA",
"content": text
},
"encodingType": "UTF8"
}
response = requests.post(url, headers=header, json=body).json()
return response
def extract_required_entities(text, access_token=None):
entities = extract_entities(text, access_token)
required_entities = {'ORGANIZATION': '', 'PERSON': '', 'LOCATION': ''}
for entity in entities['entities']:
t = entity['type']
if t in required_entities:
required_entities[t] += entity['name']
return required_entities
By giving ** extract_entities ** text and API key, you can extract various named entities. However, this time, only ** company name **, ** personal name **, and ** location ** are extracted from the text. ** extract_required_entities ** is used to extract this information.
Let's run it to check the operation.
Let's run the script we wrote earlier. First move to the ** plugins / tests ** folder. Can you see that test_language.py is in it? test_language.py specifies to call ** extract_required_entities ** of the script I wrote earlier. In other words, if it works correctly, when you give a text, it should return the company name, person's name, and location in the text.
Let's do it. Give example.txt in the data folder and execute. The character recognition result mentioned earlier is included in example.txt.
$ python test_language.py data/example.txt > result.txt
Did you get the following string as a result of execution?
{'PERSON': 'Cai Mao Zutaro', 'LOCATION': '1 Nihonbashi Tianzhu-cho, Tokyo-2-3', 'ORGANIZATION': 'Kintone Co., Ltd.'}
Finally, I will write a script to combine the Vision API and Natural Language API that I have written so far. After writing, execute the script to check the operation. Let's start by writing the script.
Write a script to combine the Vision API and the Natural Language API. The location of the script is ** plugins / apis / integration.py **. Open integration.py in an editor and rewrite the contents as follows. Please save after rewriting. In that case, please use UTF8 as the character code.
integration.py
# -*- coding: utf-8 -*-
from .language import extract_required_entities
from .vision import detect_text
def extract_entities_from_img(img_path, access_token):
text = detect_text(img_path, access_token)
entities = extract_required_entities(text, access_token)
return entities
By giving the path and API key of the image file to ** extract_entities_from_img **, it will recognize and return the company name and person name in the image file. Let's run it to check the operation.
Now let's run the script. First move to the ** plugins / tests ** folder. Can you see that test_integration.py is in it? test_integration.py specifies to call ** extract_entities_from_img ** of the script I wrote earlier. In other words, if it works correctly, when you give an image, it should return the company name etc. in the image.
Let's do it. Give example.png in the data folder and execute.
$ python test_integration.py data/example.png > result.txt
Did you get the following string as a result of execution?
{'PERSON': 'Cai Mao Zutaro', 'LOCATION': '1 Nihonbashi Tianzhu-cho, Tokyo-2-3', 'ORGANIZATION': 'Kintone Co., Ltd.'}
How was that. I tried to deal with development using the Vision API and the Natural Language API, but I think you could feel various possibilities by connecting the two.
Next, combine the apps you have created so far with Slack so that you can upload business card images to Slack and register them in kintone. Please proceed to the next article.
The contents of this hands-on are summarized in three articles.
Please prepare in advance from the following article.
Recommended Posts