Image collection by calling Bing Image Search API v5 from Python

Images were collected using the bing Image Search API for Deep Learning in TensorFlow. We are making it possible to collect a large number of images. I am running in the following environment.

type version Remarks
OS Windows10 Pro 64 bit
Anaconda Anaconda 4.4.0 I am creating a virtual environment on Anaconda
Python 3.5 Because it coexists with TensorFlow, 3.5
requests library 2.18.4 I'm using it to call the bing API
* I used it due to lack of skill and knowledge of the author, but it should be possible even if I do not use it
API Bing Image Search API v5 I'm using v5 because I haven't released v7 yet.
I have not compared it with Google.

Implemented function

Run without parameters

First, run the program without any parameters.

python get_images_via_bing_20170822.py

The progress is displayed in this way. The two lines "Get 3 images from offset X" are displayed because the Bing Image Search API is called twice. This is because the API specifications allow you to retrieve up to 150 search results with a single call, so you can call multiple times so that you can retrieve a large number of results. 30.Execution01.JPG

Cute cats have been collected in the folder. 30.Execution02.JPG

Run with parameters

Try to execute the program with parameters. Get 80 results once with the search term "cat".

python get_images_via_bing_20170822.py --query "cat" --call_count 1 --image_count 80

"Not Saved file type:" does not acquire image types other than jpeg and png, and outputs only to that effect in the log. 30.Execution03.JPG

It collects pounding: smile_cat: 30.Execution04.JPG

Implementation procedure

1. Microsoft Azure account setup

Click the Free Account link from the Azure Cognitive Service page (https://azure.microsoft.com/en-us/services/cognitive-services/). 10.StartBing01.JPG

Click the "Get Started for Free" link 10.StartBing02.JPG

I had an account, so I went to sign up. After this, I put in my personal information (memory is ambiguous). 10.StartBing03.JPG 10.StartBing04.JPG

A Welcome email will be sent to your account email address, so click the link and then go to the portal screen. Click the "+" icon and click "create" in the Bing Search APIs. 10.StartBing05.JPG

Created by selecting the free version of Bing Search APIs. 10.StartBing06.JPG

Click "Show access keys" in Manage keys and make a note of the keys. Two are displayed, but it seems that either one is fine (I am using the first one and the second one is unconfirmed). 10.StartBing07.JPG

2. Install requests

I installed it because there were no requests in my environment. Ignore those who have already installed it. Also, as mentioned at the beginning, I think it can be achieved without it. Install it in Anaconda's virtual environment "tensorflow121". Start Terminal from Anaconda Navigator. 20.InstallLibraries01.JPG

Both installed with pip. I haven't done anything particularly difficult.

pip install requests

20.InstallLibraries02.JPG

3. Python code

I made it like this.

Functional overview

-** Up to 150 images for the search term can be acquired with one Bing API call ** (API limit) -** Call multiple times with one program execution ** and get a lot --Save the retrieved file locally --Can be obtained from the middle of the search results

Runtime parameters

--image_count: Number of image files to get with one Bing API call --call_count: Number of Bing API calls in one program execution (image_count x call_count = total number of acquired images) --off_set_start: Where to call during mid-call --output_path: image output directory --query: search term

Please rewrite the "Please enter your Subscription Key here" part of the code below with the Subscription Key created in the Azure portal.

import argparse, requests, urllib.parse, os, io, imghdr

#Basic model parameters
FLAGS = None

#end point
kEndPoint = 'https://api.cognitive.microsoft.com/bing/v5.0/images/search'

#http request header
kHeaders = { 'Ocp-Apim-Subscription-Key': 'Enter your Subscription Key here' }

#Get a list of image URLs for search results
def GetImageUrls():
    print('Start getting %d images from offset %d' % (FLAGS.image_count, FLAGS.off_set_start ))
    image_list = []
    
    #Since the bing API limit is up to 150, loop and call_Get count times
    for step in range(FLAGS.call_count):

        #Get offset
        off_set = FLAGS.off_set_start + step * FLAGS.image_count

        #http request parameters
        params = urllib.parse.urlencode({
            'count': FLAGS.image_count,
            'offset': off_set,
            'imageType':'Photo',
            'q': FLAGS.query,
        })
#            'mkt': 'ja-JP',

        #bing API call
        res = requests.get(kEndPoint, headers=kHeaders, params=params)
        
        if step == 0:
            print('Total Estimated Mathes: %s' % res.json()['totalEstimatedMatches'])
        vals = res.json()['value']
        
        print('Get %d images from offset %d' % (len(vals), off_set))

        #Store the resulting image URL
        for j in range(len(vals)):
            image_list.append(vals[j]["contentUrl"])

    return image_list

#Get an image and save it locally
def fetch_images(image_list):
    print('total images:%d' % len(image_list))
    for i in range(len(image_list)):
        
        #Progress output for every 100 cases
        if i % 100 == 0:
            print('Start getting and saving each image:%d' % i)
        try:
            #Image acquisition
            response = requests.get(image_list[i], timeout=5 )

        #Since an error may occur depending on the acquisition source, just log and continue
        except requests.exceptions.RequestException:
            print('%d:Error occurs :%s' % (i, image_list[i]))
            continue
            
        #Filter by image type
        with io.BytesIO(response.content) as fh:
            image_type = imghdr.what(fh)
            if imghdr.what(fh) != 'jpeg' and imghdr.what(fh) != 'png':
                print('Not saved file type:%s' % imghdr.what(fh))
                continue
                
        #Save image locally
            with open('{}/image.{}.{}'.format(FLAGS.output_path, str(i), imghdr.what(fh)), 'wb') as f:
                f.write(response.content)
            
#Pass if run directly(Imported and does not pass at runtime)
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--image_count',
        type=int,
        default=3,
        help='collection number of image files per api call.'
  )
    parser.add_argument(
        '--call_count',
        type=int,
        default=2,
        help='number of api calls.'
  )
    parser.add_argument(
        '--off_set_start',
        type=int,
        default=0,
        help='offset start.'
  )
    parser.add_argument(
        '--output_path',
        type=str,
        default='./images',
        help='image files output directry.'
  )
    parser.add_argument(
        '--query',
        type=str,
        default='Cat',
        help='search query.'
  )
    
    #Parameter acquisition and execution
    FLAGS, unparsed = parser.parse_known_args()
    fetch_images(GetImageUrls())

What I wanted to do but couldn't do

I didn't have enough time to do the following: I'm giving up because it seems to take time for Python beginners.

--A function that automatically ends when the total number of search results is reached: You can code normally, but it is troublesome ... --Do not use the requests library: I wanted to implement it only with a standard library like urllib, but I couldn't. From a non-open SAP shop, non-standard libraries are kind of unpleasant (a matter of familiarity?). ――Because the subscription is a free trial version, I could only get up to about 1000 results even for search terms such as "cat" that seem to have many search results. By all means, the total value is low ... Hands-on: raised_hand:

Reference link

This time, I will post the referenced site as a link.

site comment
Official test tool I made the code while looking here
API official document I referred to the parameters that could be used
By the way, where did OData go?
Automatically collect images using the Bing Image Search API I found a lot about the Bing API
The story of migrating from Bing Search API v2 to v5 I especially referred to the logic of the image acquisition part
bing_image_getter.py First I changed it based on this source

Recommended Posts

Image collection by calling Bing Image Search API v5 from Python
Image collection using Google Custom Search API
POST photos with Microsoft Bing Image Search API to get Image Insights (Python)
[Python] Download original images from Google Image Search
[Python] Specify the range from the image by dragging the mouse
Speech file recognition by Google Speech API v2 using Python
EXE Web API by Python
Image collection by web scraping
Image processing by python (Pillow)
Image Processing Collection in Python
Use e-Stat API from Python
Do a search by image from the camera roll using Pythonista3
Posture detection by openpose from USB camera image using python code
Use kabu Station® API from Python
Tips for calling Python from C
Use the Flickr API from Python
Get upcoming weather from python weather api
How to use bing search api
Run Ansible from Python using API
Use Google Analytics API from Python
Handle SOAP API from Python (Zeep)
Collecting information from Twitter with Python (Twitter API)
[Python] Web application from 0! Hands-on (3) -API implementation-
Image processing by Python 100 knock # 1 channel replacement
Get images by keyword search from Twitter
Note for Pyjulia calling Julia from Python
Grayscale by matrix-Reinventor of Python image processing-
100 image processing by Python Knock # 6 Color reduction processing
Use Google Cloud Vision API from Python
I tried using UnityCloudBuild API from Python
Image acquisition from camera with Python + OpenCV
Save dog images from Google image search
Try calling Python from Ruby with thrift
Analysis of X-ray microtomography image by Python
Homebrew search from python module name (ShellScript)
Implemented Python wrapper for Qiita API v2
API explanation to touch mastodon from python
Connect to coincheck's Websocket API from Python
Automatically save images of your favorite characters from Google Image Search with Python
Get a list of articles posted by users with Python 3 Qiita API v2