Detecting an object in an image is called object recognition. This time, we will use Visual Recognition, one of the services provided by IBM Watson Developer Cloud, to perform object recognition.
You need to get a username and password to use the Visual Recognition web API.
Create an application from the IBM Bluemix admin screen and add Visual Recognition by adding a service to that application.
After adding, click "View Credentials" for that service and you will see your username and password.
The result of object recognition is a set of label and label score. Get the label used at that time.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Get label information with Visual Recognition in IBM Watson Developer Cloud
"""
import sys
import json
import requests
from pit import Pit
setting = Pit.get('iwdcat',
{'require': {'username': '',
'password': '',
}})
auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/labels'
res = requests.get(url, auth=auth_token, headers={'content-type': 'application/json'})
if res.status_code == requests.codes.ok:
labels = json.loads(res.text)
print('label groups({}): {}'.format(len(labels['label_groups']), labels['label_groups']))
print()
print('labels({}): {}'.format(len(labels['labels']), labels['labels']))
else: # error
print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
sys.exit(1)
The result will be JSON returned. label_groups is a list of label groups and labels is a list of labels.
The object recognition API requires sending images in multiple parts. The image may be png, jpg or zipped. The following is an example of sending one png image.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Perform object recognition with Visual Recognition of IBM Watson Developer Cloud
"""
import os
import sys
import json
import requests
from pit import Pit
setting = Pit.get('iwdcat',
{'require': {'username': '',
'password': '',
}})
auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'
filepath = 'var/images/first/2015-04-12-11.47.01.png' # path to image file
filename = os.path.basename(filepath)
res = requests.post(
url, auth=auth_token,
files={
'imgFile': (filename, open(filepath, 'rb')),
}
)
if res.status_code == requests.codes.ok:
data = json.loads(res.text)
for img in data['images']:
print('{} - {}'.format(img['image_id'], img['image_name']))
for label in img['labels']:
print(' {:30}: {}'.format(label['label_name'], label['label_score']))
else: # error
print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
sys.exit(1)
Analyze one image. The output result is as follows.
$ python analyze_image.py
0 - 2015-04-12-11.47.01.png
Outdoors : 0.714211
Nature Scene : 0.671271
Winter Scene : 0.669832
Vertebrate : 0.635903
Boat : 0.61398
Animal : 0.610709
Water Vehicle : 0.607173
Placental Mammal : 0.580503
Snow Scene : 0.571422
Fabric : 0.563129
Gray : 0.56078
Water Sport : 0.555034
Person : 0.533461
Mammal : 0.515725
Surface Water Sport : 0.511447
The returned data is as follows.
{'images': [{'image_id': '0', 'labels': [{'label_score': '0.714211', 'label_name': 'Outdoors'}, {'label_score': '0.671271', 'label_name': 'Nature Scene'}, {'label_score': '0.669832', 'label_name': 'Winter Scene'}, {'label_score': '0.635903', 'label_name': 'Vertebrate'}, {'label_score': '0.61398', 'label_name': 'Boat'}, {'label_score': '0.610709', 'label_name': 'Animal'}, {'label_score': '0.607173', 'label_name': 'Water Vehicle'}, {'label_score': '0.580503', 'label_name': 'Placental Mammal'}, {'label_score': '0.571422', 'label_name': 'Snow Scene'}, {'label_score': '0.563129', 'label_name': 'Fabric'}, {'label_score': '0.56078', 'label_name': 'Gray'}, {'label_score': '0.555034', 'label_name': 'Water Sport'}, {'label_score': '0.533461', 'label_name': 'Person'}, {'label_score': '0.515725', 'label_name': 'Mammal'}, {'label_score': '0.511447', 'label_name': 'Surface Water Sport'}], 'image_name': '2015-04-12-11.47.01.png'}]}
By increasing the number of files sent in multipart, multiple files can be analyzed with one request.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Perform object recognition with Visual Recognition of IBM Watson Developer Cloud
Include 3 files in 1 request
"""
import os
import sys
import json
import requests
from pit import Pit
setting = Pit.get('iwdcat',
{'require': {'username': '',
'password': '',
}})
auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'
filepaths = [
'var/images/first/2015-04-12-11.47.01.png',
'var/images/first/2015-04-12-11.44.42.png',
'var/images/first/2015-04-12-11.46.11.png',
]
files = dict((os.path.basename(filepath), (os.path.basename(filepath), open(filepath, 'rb'))) for filepath in filepaths)
res = requests.post(
url, auth=auth_token,
files=files,
)
for key, (filename, fp) in files.items():
fp.close()
if res.status_code == requests.codes.ok:
data = json.loads(res.text)
for img in data['images']:
print('{} - {}'.format(img['image_id'], img['image_name']))
for label in img['labels']:
print(' {:30}: {}'.format(label['label_name'], label['label_score']))
else: # error
print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
sys.exit(1)
The returned JSON is a list of elements for the'images' key, which contains the elements for the number of images you entered. The execution result is as follows.
$ python analyze_image_multi.py
0 - 2015-04-12-11.44.42.png
Gray : 0.735805
Winter Scene : 0.7123
Nature Scene : 0.674336
Water Scene : 0.668881
Outdoors : 0.658805
Natural Activity : 0.643865
Vertebrate : 0.603751
Climbing : 0.566247
Animal : 0.537788
Mammal : 0.518001
1 - 2015-04-12-11.46.11.png
Gray : 0.719819
Vertebrate : 0.692607
Animal : 0.690942
Winter Scene : 0.683918
Mammal : 0.669149
Snow Scene : 0.664266
Placental Mammal : 0.663866
Outdoors : 0.66335
Nature Scene : 0.656991
Climbing : 0.645557
Person : 0.557965
Person View : 0.528335
2 - 2015-04-12-11.47.01.png
Outdoors : 0.714211
Nature Scene : 0.671271
Winter Scene : 0.669832
Vertebrate : 0.635903
Boat : 0.61398
Animal : 0.610709
Water Vehicle : 0.607173
Placental Mammal : 0.580503
Snow Scene : 0.571422
Fabric : 0.563129
Gray : 0.56078
Water Sport : 0.555034
Person : 0.533461
Mammal : 0.515725
Surface Water Sport : 0.511447
Even if I included 30 files in one request, they processed normally. Maybe I can go more.
https://gist.github.com/TakesxiSximada/ca1b5aac871ec7167ff9
https://gist.github.com/TakesxiSximada/996dbbfae5fa3bbab61d
https://gist.github.com/TakesxiSximada/d451221dc2a280b7e35d
This time I'm using a Python third party package called pit to get the username and password from the config file. However, as of April 12, 2015, pit does not support Python3, so even if you pip install pit with Python3 normally, an error will occur. Fork the pit repository and there is a branch that supports Python3. Please install and use pit from there.
https://github.com/TakesxiSximada/pit/archive/fix/sximada/py3k.zip https://github.com/TakesxiSximada/pit/tree/fix/sximada/py3k
... or rather, put out a pull request without skipping> I