[PYTHON] Classification of guitar images by machine learning Part 1

Summary

Introduction

This article is a record of a summer vacation independent study by Alafor Engineer. I challenged to classify guitar images using CNN. There aren't many technically new stories, but it seems unlikely that there was a case with a guitar as the subject, so I will publish the results somehow.

environment

It is a personal computer at home.

Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Memory: 32GB
Geforce GTX 1080 (Founders Edition)
Ubuntu 16.04
Python 3.5.3
Keras(backend: Tensorflow)

Data set preparation

The dataset was scraped from a web search and manually modified in various labels. There are 13 types of labels below.

The label is biased towards the Fender / Gibson solid model because it happened to be easy to procure labeled images, and it doesn't mean much.

This time, 250 images for each class will be prepared, 50 images randomly selected from them will be used as verification samples, and the remaining 200 images will be used as learning samples.

Model building

Keras has 50 layers of ResNet as a preset (?), So I decided to use it for the time being. As it is, it is a 1000 class classification model, so the fully connected layer is excised (include_top = False) and the fully connected layer for the desired classification is grafted. The grafted part is minimal, fully connected 1 layer + Softmax.

resnet = ResNet50(include_top=False, input_shape=(224, 224, 3), weights="imagenet")
h = Flatten()(resnet.output)
model_output = Dense(len(classes), activation="softmax")(h)
model = Model(resnet.input, model_output)

Here, if you set weights =" imagenet ", the learning result using ImageNet will be set as the initial value of the weight. By starting learning from this state, it becomes fine tuning, that is, transfer learning. By the way, this time we will not freeze the trained layers, but update the weights of all layers during training.

With weights = None, the weights are initialized with random numbers. In other words, you will learn from scratch without transitions.

This time, I experimented with and without metastasis.

Learning

Since the number of sample images is relatively small, it is necessary to inflate the data for learning. This time, we are implementing Data Augumentation using Keras ImageDataGenerator.

train_gen = ImageDataGenerator(
    rotation_range=45.,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    vertical_flip=True)

train_flow = train_gen.flow_from_directory(
    directory="./image/train",
    batch_size=32,
    target_size=(224, 224),
    classes=[_["key"] for _ in classes]
)

Inflating training data by affine transformation is very easy. ~~ Moreover, loading the image file + Augmentation is a nice specification that runs in parallel and asynchronous with learning. Personally, I thought this alone was worth using Keras. ~~ </ font>

** Additions / corrections: ** In Keras, data preprocessing runs in parallel and asynchronous with learning [Keras learning engine function (by using OrderedEnqueuer / GeneratorEnqueuer in fit_generator)](https://github.com/fchollet/keras/ blob / 135efd66d00166ca6df32f218d086f98cc300f1e / keras / engine / training.py # L1834-L2096), which was not a function provided by ImageDataGenerator. It was a misleading expression due to my misunderstanding, so I will correct it.

For optimization, we adopted Momentum + SGD following ResNet's original paper.

optimizer = SGD(decay=1e-6, momentum=0.9, nesterov=True)

I also tried Adam etc., but as it is said in the street, Momentum + SGD was excellent for ResNet.

This time, we will check the accuracy of verification for each step, with 1000 mini-batch learning of 32 samples as one step. Training is stopped when the accuracy of the verification converges. (Early Stopping)

es_cb = EarlyStopping(patience=20)

Learning results

Let's see the result.

First, in the case of transfer learning. The transition of accuracy is like this. trans.PNG

Blue is the learning curve and orange is the verification curve.

The verification accuracy is fluttering, but the learning accuracy is 99.9% and the verification accuracy is 100% in 36 steps. After that, the accuracy will increase or decrease, but for the time being, we will use this snapshot of the 36th step as a deliverable. By the way, it took about 5 hours to complete 36 steps.

On the other hand, when there is no metastasis. non_trans.PNG

Compared to transfer learning, learning progresses slowly and accuracy is not good. The best score is 99% learning accuracy and 84% verification accuracy. It can be seen that there is a large gap in accuracy between learning and verification, and the generalization performance of the model is low.

The effect of transfer learning is enormous.

Let me infer this and that

I took a picture of my guitar and tried to infer it. I used the model with metastasis.

Jazzmaster jm.jpg

LesPaul lp.jpg

** Acoustic guitar ** ac.jpg

It seems to be working properly.

What about a guitar that is not in the training data?

Duo Sonic ds.jpg

It is a convincing result because it is the same student model as Mustang.

** Mysterious guitar with built-in speaker (made by Crews, model unknown) ** xx.jpg

It doesn't look like Jaguar. Then it's hard to say what it looks like, but I feel that Les Paul and Telecaster are still closer. It seems that the way of understanding the characteristics is a little different from that of humans.

Finally, let's play a little prank.

If you scribble a little on the Jazzmaster with the paint tool, ... jm2.jpg

For some reason Flying V. Hmm. .. I'm wondering where I looked and thought so.

Impressions

I tried image recognition with deep learning for the first time, but the accuracy was higher than expected. There are many cases where product tags and classifications are incorrect on musical instrument EC sites and auction sites, so I think you should check this.

It was also confirmed that transfer learning from a general image classification model is effective for image classification of a specific domain.

On the other hand, we were able to recognize the problem of deep learning that it is difficult to correct misclassification because we do not know the basis and judgment criteria of output. This model showed instability that while ideal input (noise-free image) enables highly accurate classification, even a small amount of noise can make a big change in judgment. By intentionally adding noise to the input during training, it seems that a more robust model can be generated, so I would like to try it if I have time. (⇒ I tried it.) There seems to be a technique called Grad-CAM that estimates the point of interest of the model, so I would like to try it together and see the changes.

I used ResNet-50 as a model this time, but I have a feeling (somehow) that such a classification task can be done with a lighter model, so I used a shallow Network in Network and reduced the model by distillation. I also want to challenge.

Recommended Posts

Classification of guitar images by machine learning Part 1
Classification of guitar images by machine learning Part 2
Machine learning classification
Machine learning algorithm (implementation of multi-class classification)
Judgment of igneous rock by machine learning ②
EV3 x Pyrhon Machine Learning Part 3 Classification
Machine learning memo of a fledgling engineer Part 1
Python & Machine Learning Study Memo ⑤: Classification of irises
Analysis of shared space usage by machine learning
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Machine learning memo of a fledgling engineer Part 2
Reasonable price estimation of Mercari by machine learning
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 2: Learning and evaluation)
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 1: Data analysis)
Machine learning / classification related techniques
Basics of Machine Learning (Notes)
Supervised learning 1 Basics of supervised learning (classification)
Importance of machine learning datasets
Supervised machine learning (classification / regression)
4 [/] Four Arithmetic by Machine Learning
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Predict the presence or absence of infidelity by machine learning
Try to evaluate the performance of machine learning / classification model
How to increase the number of machine learning dataset images
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
I tried to verify the yin and yang classification of Hololive members by machine learning
Significance of machine learning and mini-batch learning
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
Machine learning ③ Summary of decision tree
Classification and regression in machine learning
Have Hisako's guitar replaced with her own guitar by machine learning -Execution-
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Low-rank approximation of images by HOSVD step by step
Low-rank approximation of images by Tucker decomposition
Machine learning algorithm (generalization of linear regression)
Multi-class, multi-label classification of images with pytorch
Predict power demand with machine learning Part 2
Deep learning learned by implementation 2 (image classification)
Making Sandwichman's Tale by Machine Learning ver4
[Learning memo] Basics of class by python
Amplify images for machine learning with python
[Machine learning] LDA topic classification using scikit-learn
Is it possible to eat by forecasting stock prices by machine learning [Machine learning part 1]
Face detection by collecting images of Angers.
2020 Recommended 20 selections of introductory machine learning books
[Failure] Find Maki Horikita by machine learning
Four arithmetic operations by machine learning 6 [Commercial]
Machine learning
Machine learning algorithm classification and implementation summary
Python learning memo for machine learning by Chainer Chapter 13 Basics of neural networks
Memorandum of means when you want to make machine learning with 50 images
[Machine learning] List of frequently used packages
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Python learning memo for machine learning by Chainer until the end of Chapter 2
Judge the authenticity of posted articles by machine learning (Google Prediction API).
Machine Learning: Image Recognition of MNIST by using PCA and Gaussian Native Bayes
Chapter 6 Supervised Learning: Classification pg212 ~ [Learn by moving with Python! New machine learning textbook]
I tried to predict the presence or absence of snow by machine learning.
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning