1.First of all

This is ATcat, an Aidemy trainee. Do you guys use Instagram? I often use Instagram to look at images of cats, but when I'm looking for images of cats, I often get mixed with images other than cats as shown in the figure below. So it seems like only a cat image I have difficulty changing the system on the Instagram side, and I do not have enough time to create a dedicated application, so I got an image that is said to be a "cat" on Instagram I made a system that can extract only images of cats.

2. Object detection

This time, we used object detection to extract images of cats, and I will briefly explain this technology called object detection. When there is an object of interest in an image, the technology that identifies only what is reflected from the features of the entire image is called image recognition, but in object detection, it is a technology that identifies even where and what is reflected. In other words, for the objects contained in the image, what is the object of interest in the object and where the object is are specified, and it is represented by a rectangle called a bounding box. There is also a technique called semantic segmentation, which is more complex because it sorts by pixel. This time, we implemented using Google's pre-trained model, but the reason is that it takes a huge amount of time to prepare a dataset, learn time, set an appropriate number of classes, etc. to build and learn a model from scratch. This is because the industry very often uses pre-trained models.

3. Advance preparation

First of all, I decided to collect images from the hashtags #cat and #cat in order to collect images of cats from Instagram. At that time, I used an API called Instagram Scraper.

pip install instagram-scraper

First, install with pip. Instagram Scraper allows you to retrieve posts by specific users and images and videos posted with specified hashtags. This time, I executed it as follows.

`insta.sh`


#!/bin/sh
instagram_login_user='' #Your username
instagram_login_pass='' #Your password

target_tag='cat' #Tags to be scraped

instagram-scraper \
 --login_user $instagram_login_user \
 --login_pass $instagram_login_pass \
 --tag $target_tag \ 
 --media-types image \　#Specify the data type to get
 --maximum 100 \ #Maximum number of data to retrieve
 --latest \ #Start with the last scraping

I set the number to get as 200.

I was able to get the image like this.

4. Implementation

Next, the acquired image is determined to be a cat by object detection. Here, we implemented Google's pre-learned models Faster R-CNN and SSD using Google Colaboratory through Tensorflow Hub.

This time, I implemented it with reference to the following site. https://qiita.com/code0327/items/3b23fd5002b373dc8ae8

The flow here is to acquire and define a pre-trained model through Tensorflow Hub, and perform object detection on the cat image acquired on Instagram. After that, only when a cat is detected, an image showing the detection result will be output.

First, select the imported and trained model.


# For running inference on the TF-Hub module.
import tensorflow as tf
import tensorflow_hub as hub
import os 
import glob
import time
import numpy as np
import matplotlib.patheffects as pe 
import matplotlib.pyplot as plt
import tempfile
from six.moves.urllib.request import urlopen
from six import BytesIO
import numpy as np
from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps

#SSD or Faster R-Select CNN
#module_handle = 'https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1' 
module_handle = 'https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1'

detector = hub.load(module_handle).signatures['default']

The image of the result of object detection is as follows.


def showImage(img, r, imgfile, min_score=0.1):
  fig = plt.figure(dpi=150,figsize=(8,8))
  ax = plt.gca()
  ax.tick_params(axis='both', which='both', left=False, 
                 labelleft=False, bottom=False, labelbottom=False)
  ax.imshow(img)

  decode = np.frompyfunc( lambda p : p.decode("ascii"), 1, 1)

  boxes =       r['detection_boxes']
  scores =      r['detection_scores']
  class_names = decode( r['detection_class_entities'] )

  n = np.count_nonzero(scores >= min_score)

  # class_Color preparation corresponding to names
  class_set = np.unique(class_names[:n])
  colors = dict()
  cmap = plt.get_cmap('tab10')
  for i, v in enumerate(class_set):
    colors[v] =cmap(i)

  #Draw Rectangle Draw from the one with the lowest score
  img_w = img.shape[1]
  img_h = img.shape[0]
  for i in reversed(range(n)):
    text = f'{class_names[i]} {100*scores[i]:.0f}%'
    color = colors[class_names[i]]
    y1, x1, y2, x2 = tuple(boxes[i])
    y1, y2 = y1*img_h, y2*img_h
    x1, x2 = x1*img_w, x2*img_w

    #frame
    r = plt.Rectangle(xy=(x1, y1), width=(x2-x1), height=(y2-y1),
                      fill=False, edgecolor=color, joinstyle='round', 
                      clip_on=False, zorder=8+(n-i) )
    ax.add_patch( r )

    #Tags: text
    t = ax.text(x1+img_w/200, y1-img_h/300, text, va='bottom', fontsize=6, color=color,zorder=8+(n-i))
    t.set_path_effects([pe.Stroke(linewidth=1.5,foreground='white'), pe.Normal()])
    fig.canvas.draw()
    r = fig.canvas.get_renderer()
    coords = ax.transData.inverted().transform(t.get_window_extent(renderer=r))
    tag_w = abs(coords[0,0]-coords[1,0])+img_w/100
    tag_h = abs(coords[0,1]-coords[1,1])+img_h/120

    #Tags: background
    r = plt.Rectangle(xy=(x1, y1-tag_h), width=tag_w, height=tag_h,
                      edgecolor=color, facecolor=color,
                      joinstyle='round', clip_on=False, zorder=8+(n-i))
    ax.add_patch( r )
  #Save
  plt.savefig('/content/save/'+imgfile)
  plt.close()

I am trying to localize by enclosing it with a rectangle for those with a reliability of min_score or higher.

Finally, define the function to be detected.


import time
import numpy as np
import PIL.Image as Image

def run_detector(detector, path,img_file):
  #Import an image and convert it to a format that can be input to detector
  img = Image.open(path+img_file) # Pillow(PIL)
  if img.mode == 'RGBA' :
    img = img.convert('RGB')
  converted_img = img.copy()
  converted_img = converted_img.resize((227,227),Image.LANCZOS) #Reduce to input size
  converted_img = np.array(converted_img, dtype=np.float32)     # np.Convert to array
  converted_img = converted_img / 255. # 0.0 ～ 1.Normalize to 0
  converted_img = converted_img.reshape([1,227,227,3])
  converted_img = tf.constant(converted_img)

  t1 = time.time()
  result = detector(converted_img) #General object detection (main body)
  t2 = time.time()
  print(f'Detection time: {t2-t1:.3f}Seconds' )

  #Preparing to output the result as text
  r = {key:value.numpy() for key,value in result.items()}
  boxes =       r['detection_boxes']
  scores =      r['detection_scores']
  decode = np.frompyfunc( lambda p : p.decode('ascii'), 1, 1)
  class_names = decode( r['detection_class_entities'] )

  #Score is 0.Text output for more than 25 results (n)
  print(f'Discovery object' )
  n = np.count_nonzero(scores >= 0.25 )
  for i in range(n):
    y1, x1, y2, x2 = tuple(boxes[i])
    x1, x2 = int(x1*img.width), int(x2*img.width)
    y1, y2 = int(y1*img.height),int(y2*img.height)
    t = f'{class_names[i]:10} {100*scores[i]:3.0f}%  '
    t += f'({x1:>4},{y1:>4}) - ({x2:>4},{y2:>4})'
    print(t)
  #Output when a cat is detected
    if "Cat" in t:
      showImage(np.array(img), r, img_file,min_score=0.25) #Overlay the detection result on the image
  return t2-t1

This time, I want to output when a cat is detected, so I made it output when the "Cat" class is detected.

5. Result

As a result of this time, the result performed by Faster R-CNN detected 73 out of 100 sheets and output them. Here is an example that could be detected by both.

<img width="340" alt="代替テキスト" src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/689144/66be9ed0-b85c-9f6f-64d2-d384179cf23f.jpeg " "SSDの結果"><img width="340" alt="代替テキスト" src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/689144/3b5b960a-26e5-90aa-3187-ac236aefcb60.jpeg " "Faster R-CNNの結果"> In this figure, the left side is the SSD result and the right side is the Faster R-CNN result. The average detection time was 0.23 seconds for SSD and 1.30 seconds for Faster R-CNN. In addition, the result was 74 for SSD. Although the number of images is close, I think you can see that there are surprisingly many images that are not covered with cats and that they are good or bad at images by the detection method. Both results contained almost no images other than cats, so it can be said that they were successful in picking up only images of cats. The following image was an example of what I got even though I was not a cat. 代替テキスト When I looked at the list, I thought it was a cat, but when I looked closely, it was a dog. Also, among the images detected as cats, the one that detected the cat in the picture was rare. I thought it was quite interesting to be able to detect even a picture cat, but it seems difficult to set a class because it is necessary to learn there when distinguishing between a picture cat and a real cat. 代替テキスト

Summary

I was able to detect an image of a cat and not pick up any other images. However, since it was found that there were omissions in each detection method, in the future, it will be possible to obtain both in combination, implement object detection using the now popular DETR and YOLOv5, and use semantic segmentation. I would like to try to create a system that can extract only the cat part in the image. Thank you for staying with us until the end!

Referenced site

https://qiita.com/code0327/items/3b23fd5002b373dc8ae8 https://github.com/arc298/instagram-scraper https://githubja.com/rarcega/instagram-scraper

[PYTHON] I want to detect images of cats from Instagram