[PYTHON] I tried image recognition of "Moon and Soft-shelled Turtle" with Pytorch (using torchvision.datasets.ImageFolder which corresponds to from_from_directry of keras)

Moon and soft-shelled turtle

The moon and soft-shelled turtle are similarly round, but the difference is so great that they cannot be compared. The parable that the two are so different. https://dictionary.goo.ne.jp/word/%E6%9C%88%E3%81%A8%E9%BC%88/

There seems to be a big difference, so let's try if you can recognize images using deep learning!

I'll also explain Pytorch (a little) as appropriate. (If you make a mistake, please correct it. Thank you.)

The code is here. https://github.com/kyasby/Tuki-Suppon.git

This keyword

"Moon and soft-shelled turtle"

It seems to be similar and different.

pytorch's "torch vision.datasets.ImageFolder」

I made it because there weren't many articles that used torchvision.datasets.ImageFolderofpytorch, which corresponds to keras from_from_directry. If you put an image in the folder, it will be labeled automatically. Convenient.

pytorch "torch.utils.data.random_split」

Thanks to this, there is no need to separate train and test when putting photos in a folder.

Data to use

From google image ・ 67 images of soft-shelled turtle We have collected images that look like they were seen from above the shell. For example, an image like this. image.png (Pii-san's soft-shelled turtle) http://photozou.jp/photo/show/235691/190390795

・ 70 images of the moon I have collected images of round moons. I cut it out by hand so that a large circle appears on the screen. For example, an image like this. image.png

Data set creation

.
├── main.ipynb
├── pics
   ├── tuki
   |     |-tuki1.png
   |     |-tuki2.png
   |        
   └── kame
        |-kame1.png
        |-kame2.png

Since the images are divided into directories, use torchvision.datasets.ImageFolde to automatically label each directory.

Module import

import matplotlib.pyplot as plt
import numpy as np
import copy
import time
import os
from tqdm import tqdm

import torchvision.transforms as transforms
import torchvision.models as models
import torchvision

import torch.nn as nn
import torch

Preprocessing

transform_dict = {
        'train': transforms.Compose(
            [transforms.Resize((256,256)),
             transforms.RandomHorizontalFlip(),
             transforms.ToTensor(),
             transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                  std=[0.229, 0.224, 0.225]),
             ]),
        'test': transforms.Compose(
            [transforms.Resize((256,256)),
             transforms.ToTensor(),
             transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                  std=[0.229, 0.224, 0.225]),
             ])}

Create a pre-processing dictionary for train and test. You can create a pre-processing sequence by using transforms.Compose. It seems that they are processed in the order they were put in the arguments.

This time, transforms.Resize(256, 256) → Resize the image to 256x256.

transforms.RandomHorizontalFlip() → Create an image that is flipped horizontally.

transforms.ToTensor() → PIL or numpy.ndarray ((height x width x channel) (0 ~ 255)) To It converts to Tensor ((channel x height x width) (0.0 ~ 1.0)).

In PIL and numpy, the images are in the order of (height x width x channel), but in Pytorch, it should be noted that (channel x height x width). It seems that this order is easier to handle.

transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]) → Normalizes each GRB with the specified mean and standard deviation.

document https://pytorch.org/docs/stable/torchvision/transforms.html

data set

# ex.
# data_folder = "./pics"
# transform   = transform_dict["train"]

data = torchvision.datasets.ImageFolder(root=data_folder, transform=transform_dict[phase])

Create a dataset from the above directory.

Separation of train and test

# ex.
# train_ratio = 0.8

train_size = int(train_ratio * len(data))
# int()To an integer.
val_size  = len(data) - train_size      
data_size  = {"train":train_size, "val":val_size}
#          =>{"train": 112,       "val": 28}
data_train, data_val = torch.utils.data.random_split(data, [train_size, val_size])

torch.utils.data.random_split(dataset, lengths) Will divide the dataset ** randomly **, ** without cover **. Of course, the dataset is the dataset You can pass the number of datasets in a list to lengths.

I also stored the train and valid data sizes in the dictionary.

# ex.
# data_train => Subset(data, [4,5,1,7])
# data_val  => Subset(data, [3,8,2,6])

There are as many return values as there are list lengths. Each return value contains a list of datasets and index numbers.

(What is a Subset?)

Data loader

train_loader = torch.utils.data.DataLoader(data_train, batch_size=batch_size, shuffle=True)
val_loader   = torch.utils.data.DataLoader(data_val,   batch_size=batch_size, shuffle=False)
dataloaders  = {"train":train_loader, "val":val_loader}

Create a data loader. Pytorch creates a data loader like this to load data. I also put this in the dictionary.

Check the image

def imshow(img):
    img = img / 2 + 0.5     
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

#Randomly acquire training data
dataiter = iter(dataloaders["train"])
images, labels = dataiter.next()

#Image display
imshow(torchvision.utils.make_grid(images))
#Label display
print(' '.join('%5s' % labels[labels[j]] for j in range(8)))

It seems that the above code will display it like this. I got it from here. Https://qiita.com/kuto/items/0ff3ccb4e089d213871d スクリーンショット 2020-04-20 18.37.17.png

Modeling

model = models.resnet18(pretrained=True)
for param in model.parameters():
    print(param)
# => Parameter containing:
#tensor([[[[-1.0419e-02, -6.1356e-03, -1.8098e-03,  ...,  5.6615e-02,
#            1.7083e-02, -1.2694e-02],
#          ...
#           -7.1195e-02, -6.6788e-02]]]], requires_grad=True)

The model uses ResNet18. By putting pretrained = True in the argument, you can use the trained model. Transfer learning is performed without learning existing parameters. The weight displayed as requires_grad = True is updated. To prevent it from being updated, set as follows.

model
# => ResNet(
#   (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
#   (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#   (relu): ReLU(inplace=True)
#   (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
#   (layer1): Sequential(
#     (0): BasicBlock(
#       (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#       (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#       (relu): ReLU(inplace=True)
#       (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#       (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     )
#   ...
#   (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
#   (fc): Linear(in_features=512, out_features=1000, bias=True)
# )

I knew that the final layer was (fc), so

for p in model.parameters():
    p.requires_grad=False
model.fc = nn.Linear(512, 2)

Extract all parameters with model.parameters (), set requires_grad = False, and overwrite the final layer.

Learning settings


model = model.cuda() #If you don't have a GPU, you don't need this line.
lr = 1e-4
epoch = 40
optim = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss().cuda() #Without GPU.cuda()I don't need it.

If you want to use GPU, you need to send the model to GPU.

The model remains almost a tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Modeling


def train_model(model, criterion, optimizer, scheduler=None, num_epochs=25):
    #:Returns a bool value.
    use_gpu = torch.cuda.is_available()
    #Start time
    since = time.time()
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    #Create a dictionary with a list for saving the progress.
    loss_dict ={"train" : [],  "val" : []}
    acc_dict = {"train" : [],  "val" : []}  

    for epoch in tqdm(range(num_epochs)):
        if (epoch+1)%5 == 0:#The epoch is displayed once every five times.
            print('Epoch {}/{}'.format(epoch, num_epochs - 1))
            print('-' * 10)
        
        #In each epoch, train,Execute val.
        #The power put in the dictionary is demonstrated here, and you can write train and val in one go.
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()   #Learning mode. Do dropout etc.
            else:
                model.val()  #Inference mode. Do not drop out.
            
            running_loss = 0.0
            running_corrects = 0
            
            for data in dataloaders[phase]:
                inputs, labels = data #The data created by ImageFolder is
                                      #It will label the data.
                
                #Not required if you don't use GPU
                if use_gpu:
                    inputs = inputs.cuda()
                    labels = labels.cuda()
                
                

                #~~~~~~~~~~~~~~forward~~~~~~~~~~~~~~~
                outputs = model(inputs)

                _, preds = torch.max(outputs.data, 1)
               #torch.max returns the actual value and index.
               #torch.max((0.8, 0.1),1)=> (0.8, 0)
               #Argument 1 is whether to return the maximum value in the row direction or the column direction.
                loss = criterion(outputs, labels)

                if phase == 'train':
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()
                
                # statistics #Without GPU item()Unnecessary
                running_loss += loss.item() * inputs.size(0) 
                running_corrects += torch.sum(preds == labels)
                # (preds == labels)Is[True, True, False]Etc., but
                #python true,False is 1 each,Since it corresponds to 0,
                #You can sum with sum.
               
               #Store progress in list
               loss_dict[phase].append(epoch_loss)
               acc_dict[phase].append(epoch_acc)

            #Divide by the number of samples to get the average.
            #Putting the number of samples in the dictionary comes to life.
            epoch_loss = running_loss / data_size[phase]
            #Without GPU item()Unnecessary
            epoch_acc = running_corrects.item() / data_size[phase]

           #tensot().item()You can retrieve the value from the tensor by using.
           #print(tensorA)       => tensor(112, device='cuda:0')
           #print(tensorA.itme)) => 112
                        
            #I use format,.With nf, you can output up to n digits after the decimal point.
            #It's the same as C language.
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            
            # deep copy the model
            #Save the model when accuracy improves
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            #Without deepcopy, model.state_dict()Due to changes in the contents
            #The copied (should) data will also change.
            #The difference between copy and deepcopy is easy to understand in this article.
            # https://www.headboost.jp/python-copy-deepcopy/

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val acc: {:.4f}'.format(best_acc))
    
    #Reads and returns the best weight.
    model.load_state_dict(best_model_wts)
    return model, loss_dict, acc_dict

Learning

model_ft, loss, acc = train_model(model, criterion, optim, num_epochs=epoch)

Visualize learning


#loss,Take out acc.
loss_train = loss["train"]
loss_val   = loss["val"]

acc_train = acc["train"]
acc_val   = acc["val"]


#By writing like this, you can create a graph of rows x cols.
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,5))

#0th graph
axes[0].plot(range(epoch), loss_train, label = "train")
axes[0].plot(range(epoch), loss_val,    label =  "val")
axes[0].set_title("Loss")
axes[0].legend()#Display label of each graph

#1st graph
axes[1].plot(range(epoch), acc_train, label = "train")
axes[1].plot(range(epoch), acc_val,    label =  "val")
axes[1].set_title("Train Loss")
axes[1].legend()

#Adjust so that the 0th and 1st graphs do not overlap
fig.tight_layout()

Are you overfitting at around 11 or 12 epochs? image.png

bonus

GoogleColabolatory Collaboration is an easy way to use the GPU. https://colab.research.google.com/notebooks/welcome.ipynb?hl=ja When using images in collaboration, it is convenient to zip and upload. (It is difficult to upload one by one.) (The method of linking with the drive is also OK) At that time, the decompression can be done as follows.

#/content/pics.Please change each zip.
!unzip /content/pics.zip -d /content/data > /dev/null 2>&1 &

Also, "Copy path" that appears when you right-click the file is convenient.

matplotlib.plt This time, I output a graph with 1 row and 2 columns, but if it is 2 rows and 2 columns, for example, you can create a graph as follows. You can also overwrite a plot on a graph and plot two at the same time. I plotted two graphs at a time.

loss_train = loss["train"]
loss_val    = loss["val"]

acc_train = acc["train"]
acc_val    = acc["val"]

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,5))
axes[0,0].plot(range(epoch), loss_train, label = "train")
axes[0,0].plot(range(epoch), loss_val,    label =  "val")
axes[0,0].set_title("Loss")
axes[0,0].legend()

axes[0,1].plot(range(epoch), acc_train, c="red",  label = "train")
axes[0,1].plot(range(epoch), acc_val,    c="pink", label =  "val")
axes[0,1].set_title("Train Loss")
axes[0,1].legend()

x = np.random.rand(100)
xx = np.random.rand(200)
axes[1,0].hist(xx, bins=25, label="xx")
axes[1,0].hist(x, bins=50,   label="x")
axes[1,0].set_title("histgram")

y = np.random.randn(100)
z = np.random.randn(100)
axes[1,1].scatter(y, z, alpha=0.8, label="y,z") 
axes[1,1].scatter(z, y, alpha=0.8, label="z,y")
axes[1,1].set_title("Scatter")
axes[1,1].legend()

fig.tight_layout()

image.png

Recommended Posts

I tried image recognition of "Moon and Soft-shelled Turtle" with Pytorch (using torchvision.datasets.ImageFolder which corresponds to from_from_directry of keras)
I tried handwriting recognition of runes with CNN using Keras
I tried image recognition of CIFAR-10 with Keras-Learning-
I tried image recognition of CIFAR-10 with Keras-Image recognition-
[Python] I tried to judge the member image of the idol group using Keras
I tried to make a simple image recognition API with Fast API and Tensorflow
I tried to implement Grad-CAM with keras and tensorflow
I tried face recognition of the laughter problem using Keras.
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to get the batting results of Hachinai using image processing
I tried to convert datetime <-> string with tzinfo using strftime () and strptime ()
I tried to extract and illustrate the stage of the story using COTOHA
I tried to implement CVAE with PyTorch
I tried simple image recognition with Jupyter
I tried to compare the accuracy of Japanese BERT and Japanese Distil BERT sentence classification with PyTorch & Introduction of BERT accuracy improvement technique
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
I tried to make Kana's handwriting recognition Part 3/3 Cooperation with GUI using Tkinter
I tried handwriting recognition of runes with scikit-learn
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to integrate with Keras in TFv1.1
I tried to correct the keystone of the image
I tried using the image filter of OpenCV
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I want to collect a lot of images, so I tried using "google image download"
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to extract the text in the image file using Tesseract of the OCR engine
Image processing with Python (I tried binarizing it into a mosaic art of 0 and 1)
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to read and save automatically with VOICEROID2 2
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried to automatically read and save with VOICEROID2
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I tried to implement SSD with PyTorch now (Dataset)
I tried to compress the image using machine learning
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
[Graph drawing] I tried to write a bar graph of multiple series with matplotlib and seaborn
I tried using PI Fu to generate a 3D model of a person from one image
I tried to predict and submit Titanic survivors with Kaggle
I tried "gamma correction" of the image with Python + OpenCV
I tried to find the average of the sequence with TensorFlow
Image classification with self-made neural network by Keras and PyTorch
I tried to get Web information using "Requests" and "lxml"
I tried to classify MNIST by GNN (with PyTorch geometric)
I tried to implement ListNet of rank learning with Chainer
I tried to make GUI tic-tac-toe with Python and Tkinter
I tried to implement SSD with PyTorch now (model edition)
I tried using pyenv, which I hated without eating, and it was too convenient to sit down.
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
Image recognition of garbage with Edge (Raspberry Pi) from zero knowledge using AutoML Vsion and TPU
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried to predict the up and down of the closing price of Gurunavi's stock price using TensorFlow (progress)
I tried face recognition using Face ++
Image recognition with Keras + OpenCV
I tried to get a database of horse racing using Pandas
I tried to get the index of the list using the enumerate function
I tried to automate the watering of the planter with Raspberry Pi
I tried to visualize bookmarks flying to Slack with Doc2Vec and PCA
I tried to make deep learning scalable with Spark × Keras × Docker
I tried to make a regular expression of "time" using Python
I tried to build the SD boot image of LicheePi Nano