Introduction

The first thing you need to do to create a machine learning model for recognizing objects in an image is to collect a large number of training images. Common items such as dogs and cars can be downloaded from services such as ImageNet, but there are no images of Japanese celebrities, for example. This time, I will introduce how to collect image data for machine learning using the Tumblr API.

Click here for Google Custom Search API Pikachu

Get Tumblr API Key

First, register for an account from here.
Next, go to here and register your application.

Click Register App

Next, enter the application information. URL input (application website, App Store URL, Google Play Store URL) is required, but since we do not actually create an Oauth application, we will dodge it brilliantly with an appropriate URL. (This time, I used the URL of the app I made a long time ago)

Then a screen like this will be displayed, so click the Explore API.

Click on permission

Then the screen will look like this Click Show Keys in the upper right

The API Key is displayed here. This is what I wanted this time, so make a note of it.

Actually get the image

Now, let's actually get the image using the obtained API KEY. Tumblr has a lot of photo posts, so it seems that it is not suitable for acquiring characters such as Pikachu. So this time I will get a picture of Mr. Riho Yoshioka, who is popular recently. The acquired images are saved in a directory called images. (Reference: http://taka-say.hateblo.jp/entry/2016/12/19/235554)

import requests
import time
import shutil

LOOP = 10
URL = 'https://api.tumblr.com/v2/tagged'
payload = {
    'api_key': 'YOUR API KEY HERE',
    'tag': 'Yoshioka Riho'
}
image_idx = 0

photo_urls = []
for i in range(LOOP):
    response_json = requests.get(URL, params=payload).json()
    for data in response_json['response']:
        if data['type'] != 'photo':
            continue
        for photo in data['photos']:
            photo_urls.append(photo['original_size']['url'])
    if(len(response_json['response']) == 0):
        continue
    payload['before'] = response_json['response'][(len(response_json['response']) - 1)]['timestamp']

for photo_url in photo_urls:
    path = "images/" + str(image_idx) + ".png "
    r = requests.get(photo_url, stream=True)
    if r.status_code == 200:
      with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)
      image_idx+=1

Yes, I got a lot of images like this. so cute!

[PYTHON] Collect machine learning training image data on your own (Tumblr API Yoshioka Riho ed.)

Introduction

Get Tumblr API Key

Actually get the image