[PYTHON] Create an application that recognizes images by writing numbers on the screen on android (PyTorch Mobile) [CNN network creation]

App to be created this time

Create an image recognition application that recognizes the numbers written on the screen with Pytorch Mobile and kotlin. ** Create all the functions of the model and android for image recognition from scratch. ** ** It will be divided into two parts, ** CNN Network Creation (Python) ** and ** Android Implementation (kotlin) **.

If you are an android engineer who does not have a Python environment, or if you are having trouble creating a model, [Create an image recognition application that discriminates the numbers written on the screen with android (PyTorch Mobile) [Android implementation]](https://qiita. Please go to com / YS-BETA / items / 15a4a2c64360f91f8b3a) and download the trained model in the implementation section to proceed.

I have listed this python code on Github Github: https://github.com/SY-BETA/CNN_PyTorch

This ↓

Creation flow

  1. Download MNIST (* It is necessary to change the number of channels to 3 channels)
  2. Create a simple CNN model with python (PyTorch)
  3. Train the model
  4. Save the model
  5. Implemented a function to draw pictures on android
  6. Implement the model on android for forward propagation

What to do at this time

Do 1 to 4. Even save the model using python. The library used this time is PyTorch The execution environment is jupyter notebook Download the MNIST dataset to create and train a simple CNN model.

MNIST download

Download the handwritten digit dataset MNIST that everyone loves using torchvision

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose([
        transforms.ToTensor()])
train = torchvision.datasets.MNIST(
    root="data/train", train=True, transform=transform, target_transform=None, download=True)
test = torchvision.datasets.MNIST(
    root="data/test", train=False, transform=transform, target_transform=None, download=True)

Take a look at MNIST

Let's see what kind of data set

from matplotlib import pyplot as plt
import numpy as np

print(train.data.size())
print(test.data.size())
img = train.data[0].numpy()
plt.imshow(img, cmap='gray')
print('Label:', train.targets[0])

Execution result キャプチaaaaャ.PNG

Change from grayscale to RGB

Change the number of MNIST color channels from 1 to 3.

** Why do you bother to waste such an increase in the amount of calculation? **-> When handling images on android, handle in bitmap format, when converting it to tensor with pytorch mobile ** Can only be converted to tensor with 3 channels **. (Is grayscale conversion added in the future or is it such a specification ...) So let's train the model by converting the data to RGB.

** Not limited to this time, the model used in PyTorch Mobile needs to be a model with 3 color channels. ** **

train_data_resized = train.data.numpy()  #from torch tensor to numpy
test_data_resized = test.data.numpy()

train_data_resized = torch.FloatTensor(np.stack((train_data_resized,)*3, axis=1))  #Convert to RGB
test_data_resized =  torch.FloatTensor(np.stack((test_data_resized,)*3, axis=1))
print(train_data_resized.size())

The size of the dataset has now changed from torch.Size ([60000, 28, 28]) to torch.Size ([60000, 3, 28, 28]).

Create your own dataset

Create a custom dataset class

This time, the MNIST dataset cannot be used as it is due to the number of channels, so create a custom dataset by inheriting pytorch's Dataset. In addition, a standardization class, which is an image preprocessing, is also created here.

import torch.utils.data as data

mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)

#Image preprocessing
class ImgTransform():
    def __init__(self):
        self.transform = transforms.Compose([
            transforms.ToTensor(),  #Tensor conversion
            transforms.Normalize(mean, std)  #Standardization
        ])

    def __call__(self, img):
        return self.transform(img)

#Inherit Dataset class
class _3ChannelMnistDataset(data.Dataset):
    def __init__(self, img_data, target, transform):
        #[The number of data,height,side,Number of channels]To
        self.data = img_data.numpy().transpose((0, 2, 3, 1)) /255
        self.target = target
        self.img_transform = transform #Instance of image preprocessing class

    def __len__(self):
        #Returns the number of images
        return len(self.data)

    def __getitem__(self, index):
        #Image preprocessing(Standardization)Returns the data
        img_transformed = self.img_transform(self.data[index])
        return img_transformed, self.target[index]

Note that mean and std are the usual values that are often used for standardization, such as VGG16. This is the value at that time that is always standardized when converting to a tensor on android. If you don't know the value, you can check ʻImageUtils` of pytroch mobile in android studio. aaaaキャプチャ.PNG

Create a dataset using the class created above

train_dataset = _3ChannelMnistDataset(train_data_resized, train.targets, transform=ImgTransform())
test_dataset = _3ChannelMnistDataset(test_data_resized, test.targets, transform=ImgTransform())

#Try testing the dataset
index = 0
print(train_dataset.__getitem__(index)[0].size())
print(train_dataset.__getitem__(index)[1])
print(train_dataset.__getitem__(index)[0][1]) #You can see that it is standardized properly

Data loader creation

Create a custom data loader with the created dataset. Batch size is 100

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=100, shuffle=False)

Create a CNN network

Create a simple network with 1 convolution layer and 3 fully connected layers. (I hate taking time to learn)

from torch import nn
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(3)
        self.conv = nn.Conv2d(3, 10, kernel_size=4)
        self.fc1 = nn.Linear(640, 300)
        self.fc2 = nn.Linear(300, 100)
        self.fc3 = nn.Linear(100, 10)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        x = self.pool(x)
        x = x.view(x.size()[0], -1) #Vectorized matrix for linear processing(view(Height, width))
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

model = Model()
print(model)

Such a network キャプfadsfcvaチャ.PNG

Train the network

Create functions for training mode and inference mode

import tqdm
from torch import optim

#Inference mode
def eval_net(net, data_loader, device="cpu"): #If you have a GPU, go to gpu
    #Inference mode
    net.eval()
    ypreds = [] #Predicted label storage variable
    for x, y in (data_loader):
        #Transfer to device with to method
        x = x.to(device)
        y = [y.to(device)]
        #Predict the class with the highest probability
        #forward propagation
        with torch.no_grad():
            _, y_pred = net(x).max(1)
            ypreds.append(y_pred)
            #Prediction for each mini-batch into one tensor
            y = torch.cat(y)
            ypreds = torch.cat(ypreds)
            #Calculate predicted value(Correct answer = sum of predictive elements)
            acc = (y == ypreds).float().sum()/len(y)
            return acc.item()


#Training mode
def train_net(net, train_loader, test_loader,optimizer_cls=optim.Adam, 
              loss_fn=nn.CrossEntropyLoss(),n_iter=3, device="cpu"):
    train_losses = []
    train_acc = []
    eval_acc = []
    optimizer = optimizer_cls(net.parameters())
    for epoch in range(n_iter):  #Turn 4 times
        runnig_loss = 0.0
        #In training mode
        net.train()
        n = 0
        n_acc = 0
    
        for i, (xx, yy) in tqdm.tqdm(enumerate(train_loader),
                                     total=len(train_loader)):
            xx = xx.to(device)
            yy = yy.to(device)
            output = net(xx)
            
            loss = loss_fn(output, yy)
            optimizer.zero_grad()   #Initialize optimizer
            loss.backward()   #Loss function(Cross entropy error)From backpropagation
            optimizer.step()
            
            runnig_loss += loss.item()
            n += len(xx)
            _, y_pred = output.max(1)
            n_acc += (yy == y_pred).float().sum().item()
            
        train_losses.append(runnig_loss/i)
        #Prediction accuracy of training data
        train_acc.append(n_acc / n)
        #Prediction accuracy of validation data
        eval_acc.append(eval_net(net, test_loader, device))

        #Show results with this epoch
        print("epoch:",epoch, "train_loss:",train_losses[-1], "train_acc:",train_acc[-1],
              "eval_acc:",eval_acc[-1], flush=True)

First try to infer without learning

eval_net(model, test_loader)

Since the seed value of the random parameter of the network is not fixed, it is not reproducible and changes randomly, but in my environment, the score before learning was 0.0799999982.

To learn

Learning using the function created earlier

train_net(model, train_loader, test_loader)

Eventually, the prediction accuracy was about 0.98000001907. Well, the accuracy is too high. I'm worried if the accuracy is too good ...

I will actually infer one

Put one data in the trained model and try to predict the label.

data = train_dataset.__getitem__(0)[0].reshape(1, 3, 28, 28) #Resize (note the size of the data loader)
print("label",train_dataset.__getitem__(0)[1].data)
model.eval()
output = model(data)
print(output.size())
output

Execution result キafdfafdaャプチャ.PNG It can be seen that the score with an index of 5 is the highest and can be predicted.

Finally, model creation and learning is complete! !!

Save the model

Save the model for use on android

#Save model
model.eval()
#Sample input size
example = torch.rand(1, 3, 28, 28)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("./CNNModel.pt")
print(model)

end

For the time being, this is the end of [Network Creation] !! Next, we will implement the created model on android. When I converted it to a tensor with PyTorch Mobile, it became an RGB tensor, and I couldn't make it grayscale, so I had to bother to convert MNIST to RGB, which was a lot of troublesome processing. As a result, I couldn't use the MNIST dataset as it was, and I had to use my own dataset and data loader. Well, I think it can hardly be used at grayscale or commercial level. Also, although it was a properly made CNN network, I was surprised that the accuracy was unexpectedly high, as expected CNN I'll give you Github for the time being.

This code Github: https://github.com/SY-BETA/CNN_PyTorch

Trained model created this time (.py): https://github.com/SY-BETA/CNN_PyTorch/blob/master/CNNModel.pt

Let's go to Android implementation Create an image recognition application that discriminates the numbers written on the screen with android (PyTorch Mobile) [Android implementation]

Recommended Posts

Create an application that recognizes images by writing numbers on the screen on android (PyTorch Mobile) [CNN network creation]
Create an image recognition application that discriminates the numbers written on the screen on android (PyTorch Mobile) [Android implementation]
Create a web application that recognizes numbers with a neural network
[kotlin] Create an app that recognizes photos taken with a camera on android
About the shortest path to create an image recognition model by machine learning and implement an Android application