Introduction

Nice to meet you, I am a consultant of Hitachi, Ltd. Lumada Data Science Lab.. At Hitachi, there is an activity called "small group activity" in which volunteers gather to work on their favorite theme. This time, the team of data scientists I belong to did an activity called ** "Let's make a sashimi recognition app!" **. Through this effort, we will introduce learning and awareness, code examples of actual models, and so on.

By the way, which one is amberjack?

The correct answer is ...

To be honest, I don't know the difference between amberjack and yellowtail ... In such a case, the "sashimi recognition app" is useful!

Trigger for activity

My main job is data analysis, but if I acquire system construction skills, I can utilize it in prototype development and have customers try it, and I will be able to experience and share the results. Therefore, we started with the purpose of creating opportunities to work from scratch to system construction and sharing know-how among members.

How it was developed

Member composition

The roles were divided in this way. I was mainly in charge of model development.

How to proceed

We proceeded in the order of theme determination → data collection → model construction → application development.

Theme decision

Instead of verifying the awesomeness of AI, we aimed for services and systems that customers could actually use. As a result of each member bringing their ideas and discussing hypotheses and needs, the "sashimi recognition app" was finally selected.

-** Hypothesis planning **

It is difficult to distinguish the types of sashimi. In particular, foreign visitors to Japan often eat fish without knowing what kind of fish it is. If foreign visitors to Japan can learn about the differences in sashimi, deepen their understanding of Japanese culture, and promote it around the world, I think we can contribute to the restaurant and tourism industries.

-** Needs Survey **

--Inbound demand is expected at the TOKYO Olympics. --There is a survey result that foreign visitors to Japan expect most from Japan is "Japanese food". ――On the other hand, the restaurant side that provides "Japanese food" feels that the background Japanese culture and the attractiveness of the ingredients cannot be fully explained.

I considered it before the new corona epidemic ...

-** Business modeling image **

In cooperation with restaurant apps such as Kaitenzushi, we will provide an app that can answer the type of sashimi correctly or compete with AI. If the customer's correct answer rate is higher than the AI judgment result, points that can be used at the store will be given. It will lead to understanding and consumption of Japanese culture while having customers enjoy it.

Data collection

Method 1: Image search

I collected images from the internet. This is hard ...

Method (2): Actual shooting

I bought the real thing at a supermarket and took an image. We shot videos by changing the shooting direction, how to shine light, the color of the plate, etc., and devised a way to collect many images at no cost. However, you can eat too much sashimi and break your stomach.

In the end, I collected over 40,000 sashimi photos, though I had a hard time.

Model building

Model learning flow

We built an AI that continues learning by rotating the cycle of data collection → preprocessing → data division → model construction → (re) learning.

The model (named SushiNet) creates a customized 24-tier Residual Network (ResNet) by scratch. In (re) learning, the data sets we collect are imbalanced, so we try to focus on the underrated classes by setting weights for each class.

Execution example (data padding)

Here, we will introduce an example of executing data padding in (re) learning.

rows=cols=3
datagen = ImageDataGenerator(rotation_range=90)
show_aug_images(x)

In this way, the image is rotated randomly.

An example full version of the model is given at the end of this article. If you are interested, please try it ︕

App development

System configuration and repository

Shows the correspondence between each component of the system and the repository.

Processing method ①

When you open the URL of the sashimi recognition app (= landing) on your smartphone's browser, the front-end app (= HTML + JavaScript + CSS) will be downloaded and the app will start on the browser.

Processing method ②

When you start the sashimi recognition app, shoot the sashimi with the camera and upload it, the front-end app controls the camera device of the terminal via the browser, acquires the image and sends it to the server.

The server receives the image, the back-end app recognizes and judges the sashimi, and the judgment result is sent to the front-end app via the browser. The judgment result received by the front-end application is displayed on the browser of the smartphone.

Difficult point

-** HTML5 specifications change quickly **

I used the HTML5 getUserMedia () API to access the camera device from my smartphone's browser. This API has a status of "Candidate Recomendation" and is not yet an official specification, so the specification is subject to change. In fact, I was a little impatient when it became clear during the development that it became impossible to use HTTP scheme communication due to the strengthened security and it was necessary to use HTTPs.

-** TensorFlow (1 series) is not thread safe **

This is a very important point when using Tensorflow from within a web application. A calculation graph created by Tensorflow in thread A that processes one HTTP request A interferes with a calculation graph created in thread B that processes another HTTP request B. To avoid this, the calculation graph should be a singleton.

In Tensorflow2 system, it has been solved (as it is), and the API design is made so that it is hard to trip. maybe.

-** How to share image processing in each component **

Before inputting the image taken by the camera to the model, some image processing needs to be performed. It is necessary to carefully design how to share this between the front-end application (= JavaScript on the browser) and the back-end application (= Python). If this design is loose, bugs will occur frequently after the integration test and rework will be done. In fact, due to a design error in the compression / decompression process, the JavaScript side is sending in RGBA format, while the Python side is expecting RGB format, so an error has occurred.

Summary

Human vs AI What is the result ...?

Click here for the completed app.

How close has AI been to humans?⁉

Types of sashimi	Human correct answer rate	AI correct answer rate	Win or lose
Tuna	95%	75%	Human win
Yellowtail	44%	78%	AI wins
Pacific saury	50%	100%	AI wins
greater amberjack	40%	40%	Are you not good at either?

I think the results were quite good. (Although the correct answer rate for humans is not very high in the first place ...)

However, there is still a large gap between the actual product and the collected data, and there is room for improvement. There are few easy-to-understand features such as the appearance of animals, and the fact that the appearance changes depending on the freshness is also a factor that makes it difficult to judge. How to clean the data obtained from the Internet is a future issue.

Results of activities

――The goal of covering the whole picture of system development has been almost achieved. ――It led to mutual knowledge sharing and improvement among members --I also learned about the application development part (cloud cooperation, etc.) ――Data collection is difficult anyway ... I understand a little about the difficulties of customers and SEs

In fact, it may be the most important to know the hardships of the last customer and SE. I hope this app can be put to practical use by next year's Olympics.

Reference: Model execution example (full version)

Library import

from tensorflow.keras.layers import Dropout, BatchNormalization, Flatten, Activation, Input, Dense,Add,Reshape                                                                                                                  
from tensorflow.keras.layers import ZeroPadding2D,Conv2D,ELU,MaxPooling2D,AveragePooling2D,GlobalAveragePooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import Model
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, Callback, ReduceLROnPlateau, LearningRateScheduler,CSVLogger
from tensorflow.keras.losses import binary_crossentropy, categorical_crossentropy, mean_squared_error
from tensorflow.keras.optimizers import Adam, RMSprop, SGD
from tensorflow.keras.utils import Sequence, to_categorical
from tensorflow.keras import losses, models, optimizers
from tensorflow.keras import backend as K
import tensorflow as tf
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import warnings
import os
from datetime import datetime

warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')
%matplotlib inline

Data padding

Preparing for data padding

#Data load
img_path="xxx.jpg "
im = Image.open(img_path)
im = im.resize((224,224),Image.BILINEAR)
x = np.expand_dims(im,axis=0)

#Data padding visualization function
def show_aug_images(x):
    g = datagen.flow(x,
                     batch_size=1)
    plt.figure(figsize=(8,8))
    for i in range(rows*cols):
        aug_img=g.next()
        plt.subplot(rows,cols,i+1)
        plt.axis('off')
        plt.imshow(aug_img[0].astype('uint8'))

Data padding ①: Rotate the image randomly

rows=cols=3
datagen = ImageDataGenerator(rotation_range=90)
show_aug_images(x)

Data padding (2): Randomly shift the image channel

rows=cols=3
datagen = ImageDataGenerator(brightness_range=[0.1,3])
show_aug_images(x)

Data padding ③: Randomly invert the input of the image vertically

rows=cols=3
datagen = ImageDataGenerator(width_shift_range=0.4,fill_mode='reflect')
show_aug_images(x)

Data padding ④: Zoom the image randomly

rows=cols=3
datagen = ImageDataGenerator(zoom_range=0.4)
show_aug_images(x)

Model building (SushiNet)

Utilizing the residual structure, we built a 24-layer ResNet model that uses ELU (Exponential Linear Unit) by stacking multiple residual bolocks.

def cbe_block(X,F,kernel_size,strides,padding):
    X = Conv2D(filters=F,kernel_size=kernel_size,strides=strides,padding=padding)(X)
    X = BatchNormalization()(X)
    X = ELU()(X)
    return X
def cb_block(X,F,kernel_size,strides,padding):
    X = Conv2D(filters=F,kernel_size=kernel_size,strides=strides,padding=padding)(X)
    X = BatchNormalization()(X)
    return X
def residual_id(X, f, filters):
    F1, F2, F3 = filters
    X_s = X
    X = cbe_block(X=X,F=F1,kernel_size=(1,1),strides=(1,1),padding='valid')
    X = cbe_block(X=X,F=F2,kernel_size=(f,f),strides=(1,1),padding='same')
    X = cb_block(X=X,F=F3,kernel_size=(1,1),strides=(1,1),padding='valid')
    X = Add()([X, X_s])
    X = ELU()(X)    
    return X
def residual_conv(X, f, filters, s=2):
    F1, F2, F3 = filters
    X_s = X
    X = cbe_block(X=X,F=F1,kernel_size=(1,1),strides=(s,s),padding='valid')
    X = cbe_block(X=X,F=F2,kernel_size=(f,f),strides=(1,1),padding='same')
    X = cb_block(X=X,F=F3,kernel_size=(1,1),strides=(1,1),padding='valid')
    X_s = Conv2D(filters=F3, kernel_size=(1,1), strides=(s,s), padding='valid')(X_s)
    X_s = BatchNormalization()(X_s)
    X = Add()([X, X_s])
    X = ELU()(X)
    return X
def SushiNet(input_shape = (224, 224, 3), classes = 10):
    X_input = Input(input_shape)
    X = ZeroPadding2D((3, 3))(X_input)
    X = Conv2D(64, (7, 7), strides = (2, 2))(X)
    X = BatchNormalization()(X)
    X = ELU()(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)
    X = residual_conv(X, f = 5, filters = [64, 64, 256],s = 1)
    for i in range(2):
        X = residual_id(X, 3, [64, 64, 256])
    X = Dropout(0.3)(X)
    X = residual_conv(X, f = 5, filters= [128, 128, 512], s = 2)
    for i in range(3):
        X = residual_id(X, 3, [128, 128, 512])
    X = Dropout(0.3)(X)
    X = GlobalAveragePooling2D()(X)
    X = Dropout(0.3)(X)
    X = Dense(64)(X)
    X = Dense(classes, activation='softmax')(X)
    model = Model(inputs = X_input, outputs = X)
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model
model = SushiNet(input_shape = (224, 224, 3), classes = 10)

training

** Tips: ** class_weight: When training an imbalanced class, you can scale the loss function by pre-setting the weight for each class.

datagen = image.ImageDataGenerator(
                width_shift_range = 0.1,
                height_shift_range=0.1,
                rotation_range=10,
                channel_shift_range=150,
                zoom_range=0.5,
                horizontal_flip=True,
                vertical_flip=True,
                fill_mode='reflect',
                brightness_range=[0.5,3.5])
datagen.fit(X_train)
model.fit_generator(datagen.flow(X_train, y_train,
                                 batch_size=batch_size),
                                 epochs=epochs,
                                 validation_data=(X_valid, y_valid),
                                 steps_per_epoch = X_train.shape[0]/batch_size,
                                 shuffle = True,
                                 class_weight = class_weight)

[PYTHON] Sharing the awareness gained by making a sashimi recognition app