[PYTHON] Collect machine learning training image data on your own (Tumblr API Yoshioka Riho ed.)

Introduction

The first thing you need to do to create a machine learning model for recognizing objects in an image is to collect a large number of training images. Common items such as dogs and cars can be downloaded from services such as ImageNet, but there are no images of Japanese celebrities, for example. This time, I will introduce how to collect image data for machine learning using the Tumblr API.

Click here for Google Custom Search API Pikachu

Get Tumblr API Key

  1. First, register for an account from here.
  2. Next, go to here and register your application.

Click Register App Tumblr.png

Next, enter the application information. URL input (application website, App Store URL, Google Play Store URL) is required, but since we do not actually create an Oauth application, we will dodge it brilliantly with an appropriate URL. (This time, I used the URL of the app I made a long time ago)

Tumblr.png

Then a screen like this will be displayed, so click the Explore API. Tumblr.png

Click on permission Tumblr.png

Then the screen will look like this Click Show Keys in the upper right API_Console___Tumblr.png

The API Key is displayed here. This is what I wanted this time, so make a note of it. API_Console___Tumblr.png

Actually get the image

Now, let's actually get the image using the obtained API KEY. Tumblr has a lot of photo posts, so it seems that it is not suitable for acquiring characters such as Pikachu. So this time I will get a picture of Mr. Riho Yoshioka, who is popular recently. The acquired images are saved in a directory called images. (Reference: http://taka-say.hateblo.jp/entry/2016/12/19/235554)

import requests
import time
import shutil

LOOP = 10
URL = 'https://api.tumblr.com/v2/tagged'
payload = {
    'api_key': 'YOUR API KEY HERE',
    'tag': 'Yoshioka Riho'
}
image_idx = 0

photo_urls = []
for i in range(LOOP):
    response_json = requests.get(URL, params=payload).json()
    for data in response_json['response']:
        if data['type'] != 'photo':
            continue
        for photo in data['photos']:
            photo_urls.append(photo['original_size']['url'])
    if(len(response_json['response']) == 0):
        continue
    payload['before'] = response_json['response'][(len(response_json['response']) - 1)]['timestamp']

for photo_url in photo_urls:
    path = "images/" + str(image_idx) + ".png "
    r = requests.get(photo_url, stream=True)
    if r.status_code == 200:
      with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)
      image_idx+=1

3.png 12.png 29.png

Yes, I got a lot of images like this. so cute!

Recommended Posts

Collect machine learning training image data on your own (Tumblr API Yoshioka Riho ed.)
Collect machine learning training image data on your own (Google Custom Search API Pikachu)
How to collect machine learning data
Put your own image data in Deep Learning and play with it
[Machine learning] Create a machine learning model by performing transfer learning with your own data set
Machine learning Training data division and learning / prediction / verification
Collect images for machine learning (Bing Search API)