[PYTHON] How to create a face image data set used in machine learning (1: Acquire candidate images using WebAPI service)

When performing image recognition / classification using machine learning such as DNN, a large amount of training images are required when executing learning. There are many cases where data sets (image sets) published on the web etc. are used. However, in the case of face image classification / recognition, there are few data sets that are open to the public because of problems such as portrait rights. As a result, many people are having a hard time collecting learning images. So here's one way to create your own face image dataset.

Development environment used this time

Acquisition of candidate images

First, collect images that may show the target face (hereinafter referred to as candidate images). The following collection methods are possible.

This time, I would like to write about ** How to collect candidate images using WebAPI service **.

Collection of candidate images using the publicly available Web API

There are many Web API services that return image information published on the web or SNS as a response, but Here we use the ** Bing Image Search API **, which is provided free of charge at Microsoft Cognitive Services.

In this collection method using the Bing Image Search API,

  1. Set the search condition in the query and get the URL of the image that matches it with the API
  2. Create a new image copy with OpenCV from the image URL obtained in 1.

Take two steps. Therefore, as mentioned above, OpenCV is required as a library to import.

Steps to use the Bing Image Search API

Microsoft Cognitive Services registration and Bing Image Search API usage registration procedure Please see the separate article for more information. How to register Microsoft Cognitive Services

To use the API, you will need a subscription key that you can get when you complete the API registration.

Code example to get image information which is a search result from API

Basically, like a general Web API, by throwing an HTTP method (get request this time) to the API endpoint, You can receive the response information (json format) that stores the image URL that is the search result of the query. The code example that realized this is as follows.

# _*_ coding: utf-8 _*_

import requests
import urllib.request
import os.path
import cv2
import numpy as np

#API End Point and Subscription Key
REQUEST_BASE_URL = 'https://bingapis.azure-api.net/api/v5/images/search'
SUBSCRIPTION_KEY = '********************'


def api_request(query, count=10):

    #header and parameter settings
    headers = { 'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY, }
    params = {
        'q': str(query),
        'count': int(count),
        'mkt': 'ja-JP',
        'offset': '0',
    }

    #API throw
    response = requests.get(REQUEST_BASE_URL, headers=headers, params=params)
    res_headers = response.headers
    status_code = response.status_code
    results = None

	#Content setting to be returned to the caller according to the status code and header content
    if status_code == 200:
        if 'content-length' in res_headers and int(res_headers['content-length']) == 0:
            results = None
        elif 'content-type' in res_headers and isinstance(res_headers['content-type'], str):
            if 'application/json' in res_headers['content-type'].lower():
                results = response.json()['value']['contentUrl'] if response.content else None
            else
                results = None
    else:
        #Forced termination if status code is other than 200
        print("Error! - status code : {0}".format(response.status_code))
        print("Message : {0}".format(response.json()['error']['message']))
        exit()

	#Returns the image URL that is the search result
    return results

Please refer to Official API Reference for detailed specifications of API request and response. Please give me. Here, only the specifications that are considered to be the minimum required at the time of request are described.

Request header information

Information item Contents
Ocp-Apim-Subscription-Key Subscription key string

Request parameters

Parameter item Data type Contents
q String Character string that becomes a search query
count UnsignedShort Number of images in search results returned in response
mkt String Request The name of the region where the request is being sent (specified by Market Code)

This time, only the image URL that is the search result is obtained from the response information of the API. In addition, the following information can be obtained with the Bing Image Search API.

--Image pixel size --Image file size --URL of the page where the image is posted --Image title --URL / size of thumbnail image

Code example to create a candidate image from the image URL

After getting the image URL set of the search result from the above method, Create a new candidate image by using OpenCV from that URL.

def make_image(image_url):
	#Load image information of URL
	resp = urllib.request.urlopen(image_url)

	#Rebuild image memory buffer so that it can be read by OpenCV
	image = np.asarray(bytearray(resp.read()), dtype=np.uint8)
	
	#Read image from memory buffer
	image = cv2.imdecode(image, cv2.IMREAD_COLOR)

	#Export as a new image
	cv2.imwrite("hoge.jpg ", image)

When rebuilding the memory buffer, set the data type to uint8 type. This is because the pixel values that make up an image generally take an integer value within the interval [0,255]. Also, by specifying cv2.IMREAD_COLOR when reading an image with the imdecode method, It is read as a color image.   Although it is easy, I explained an example of collecting candidate images using the Bing Image Search API. Next time will write how to collect candidate images by frame analysis of videos with OpenCV.

Recommended Posts

How to create a face image data set used in machine learning (1: Acquire candidate images using WebAPI service)
How to make a face image data set used in machine learning (2: Frame analysis of video to obtain candidate images)
I want to create a machine learning service without programming! WebAPI
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS with anaconda
How to collect machine learning data
How to create a large amount of test data in MySQL? ??
How to create a serverless machine learning API with AWS Lambda
How to quickly create a machine learning environment using Jupyter Notebook on macOS Sierra with anaconda
How to send a visualization image of data created in Python to Typetalk
[Development environment] How to create a data set close to the production DB
I want to create a machine learning service without programming! Text classification
[For recording] Keras image system Part 1: How to create your own data set?
[Machine learning] Create a machine learning model by performing transfer learning with your own data set
A machine learning beginner tried to create a sheltie judgment AI in one day
A memorandum of method often used in machine learning using scikit-learn (for beginners)
Data supply tricks using deques in machine learning
How to code a drone using image recognition
Face image dataset sorting using machine learning model (# 3)
How to create data to put in CNN (Chainer)
How to create a Rest Api in Django
How to set up a Google Colab environment with Coursera's advanced machine learning courses
How to split machine learning training data into objective variables and others in Pandas
How to create a radial profile from astronomical images (Chandra, XMM etc.) using python
Create a dataset of images to use for learning
How to set up a Python environment using pyenv
How to create an image uploader in Bottle (Python)
How to execute a command using subprocess in Python
I tried to compress the image using machine learning
[Part 1] Use Deep Learning to forecast the weather from weather images
[Part 3] Use Deep Learning to forecast the weather from weather images
[Part 2] Use Deep Learning to forecast the weather from weather images
People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning
Paper: Music processing in the brain
How to make a face image data set used in machine learning (3: Face image generation from candidate images Part 1)
Image recognition model using deep learning in 2016
Image alignment: from SIFT to deep learning
"Deep Learning from scratch" in Haskell (unfinished)
Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster
How to build Anaconda virtual environment used in Azure Machine Learning and link with Jupyter
How to create an instance of a particular class from dict using __new__ () in python
How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
How to get only the data you need from a structured data set using a versatile method
How to generate a query using the IN operator in Django
How to increase the number of machine learning dataset images
Create AI to identify Zuckerberg's face by deep learning ③ (Data learning)
Data set for machine learning
How about Anaconda for building a machine learning environment in Python?
How to create a heatmap with an arbitrary domain in Python
<Pandas> How to handle time series data in a pivot table
How to update a Tableau packaged workbook data source using Python
How to create a CSV dummy file containing Japanese using Faker
People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning
[TensorFlow 2.x compatible version] How to train a large amount of data using TFRecord & DataSet in TensorFlow (Keras)