[PYTHON] The result was better when the training data of the mini-batch was made a hybrid of fixed and random with a neural network.

Introduction

Last time I happened to get good results in the process of creating the article, so I will introduce it.

The learning result when the learning data of the mini-batch was randomized was better than the last time, so if I was investigating the cause, the learning result at the fixed time vs. at the random time It turned out that I intended to compare the learning results at the time of fixed vs. fixed + at random. (Last time article is a comparison between fixed time and random time)

environment

content of study

We will check the difference between the case where the training data of the mini-batch is randomly shuffled for each general epoch and the case where it is a hybrid of fixed and random shuffle. As usual, it is the sin function that is trained.

[training data]

Implementation

Create model twice

This discovery is due to forgetting to reset the training result of the model before training at random time. I didn't know how to reset it, and it was troublesome to change the model name etc., so I coded the same process twice. (Someone please tell me how to reset ...)

Create model twice


#Modeling
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)

'''
abridgement(General random learning only)
'''

#Modeling(I don't know how to reset the training result of the model, so create the model again)
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)

Learning parameters

All parameters are appropriate.

The epoch at the time of hybrid is fixed and random 250 times each.

Whole code

It's a dirty scribbled code, but it works, so I'm okay.

The entire


# -*- coding: utf-8 -*-

#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt

#data
def get_dataset(N):
    x = np.linspace(0, 2 * np.pi, N)
    y = np.sin(x)
    return x, y

#neural network
class MyChain(Chain):
    def __init__(self, n_units=10):
        super(MyChain, self).__init__(
             l1=L.Linear(1, n_units),
             l2=L.Linear(n_units, n_units),
             l3=L.Linear(n_units, 1))

    def __call__(self, x_data, y_data):
        x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
        y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
        return F.mean_squared_error(self.predict(x), y)

    def  predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        h3 = self.l3(h2)
        return h3

    def get_predata(self, x):
        return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data

# main
if __name__ == "__main__":

    #Training data
    N = 1000
    x_train, y_train = get_dataset(N)

    #Learning parameters
    batchsize = 10
    n_epoch = 500
    n_units = 100

    #Modeling
    model = MyChain(n_units)
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    #Learning loop(General random only)
    print "start..."
    normal_losses =[]
    start_time = time.time()
    for epoch in range(1, n_epoch + 1):

        # training
        perm = np.random.permutation(N)
        sum_loss = 0
        for i in range(0, N, batchsize):
            x_batch = x_train[perm[i:i + batchsize]]
            y_batch = y_train[perm[i:i + batchsize]]
            model.zerograds()
            loss = model(x_batch,y_batch)
            sum_loss += loss.data * batchsize
            loss.backward()
            optimizer.update()

        average_loss = sum_loss / N
        normal_losses.append(average_loss)

        #Output learning process
        if epoch % 10 == 0:
            print "(normal) epoch: {}/{} normal loss: {}".format(epoch, n_epoch, average_loss)

    interval = int(time.time() - start_time)
    print "Execution time(normal): {}sec".format(interval)

    #Modeling(I don't know how to reset the training result of the model, so create the model again)
    model = MyChain(n_units)
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    #Learning loop(hybrid)
    #Mini-batch training data fixed and randomly hybrid
    hybrid_losses =[]
    for order in ["fixed", "random"]:
        start_time = time.time()
        for epoch in range(1, (n_epoch + 1) / 2):

            # training
            perm = np.random.permutation(N)
            sum_loss = 0
            for i in range(0, N, batchsize):
                if order == "fixed": #The order of learning is fixed
                    x_batch = x_train[i:i + batchsize]
                    y_batch = y_train[i:i + batchsize]
                elif order == "random": #Random order of learning
                    x_batch = x_train[perm[i:i + batchsize]]
                    y_batch = y_train[perm[i:i + batchsize]]

                model.zerograds()
                loss = model(x_batch,y_batch)
                sum_loss += loss.data * batchsize
                loss.backward()
                optimizer.update()

            average_loss = sum_loss / N
            hybrid_losses.append(average_loss)

            #Output learning process
            if epoch % 10 == 0:
                print "(hybrid) epoch: {}/{} {} loss: {}".format(epoch, n_epoch, order, average_loss)

        interval = int(time.time() - start_time)
        print "Execution time(hybrid {}): {}sec".format(order, interval)

    print "end"

    #Error graphing
    plt.plot(normal_losses, label = "normal_loss")
    plt.plot(hybrid_losses, label = "hybrid_loss")
    plt.yscale('log')
    plt.legend()
    plt.grid(True)
    plt.title("loss")
    plt.xlabel("epoch")
    plt.ylabel("loss")
    plt.show()

Execution result

error

Compared to the general method (normal), the fixed and random hybrid method (hybrid) has an error that is an order of magnitude better. Switching from fixed to random is where hybrid_loss sharply decreases at the center of the horizontal axis.

If the number of epochs is the same, it seems that it is better to have less fixed and more random.

sin_hybrid_order.png

Summary

I do not know the academic cause, but in this learning target (sin function from 0 to 2π), if the learning data of the mini-batch is a hybrid of fixed and random, the error is an order of magnitude better than the general random-only method. became.

I thought this was rumored overfitting, so I tested it, but the result was the same as when I was learning.

I felt that the accumulation of detailed ideas like this one led to the creation of a highly accurate neural network.

Recommended Posts

The result was better when the training data of the mini-batch was made a hybrid of fixed and random with a neural network.
A network diagram was created with the data of COVID-19.
Visualize the inner layer of a neural network
Train MNIST data with a neural network in PyTorch
The story of making a music generation neural network
What I was addicted to when I built my own neural network using the weights and biases I got with scikit-learn's MLP Classifier.
Measure the importance of features with a random forest tool
As a result of mounting and tuning with POH! Lite
A collection of methods used when aggregating data with pandas
The goodness of the touch screen disappeared when the tablet PC was made into a Manjaro i3 environment
Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)
A story and its implementation that arbitrary a1 * a2 data can be represented by a 3-layer ReLU neural network with a1 and a2 intermediate neurons with an error of 0.
[TF2.0 application] A case where general-purpose Data Augmentation was parallelized and realized at high speed with the strong data set function of the TF example.
Save the result of the life game as a gif with python
Count the maximum concatenated part of a random graph with NetworkX
Create a compatibility judgment program with the random module of python.
Understand the number of input / output parameters of a convolutional neural network
When I scraped the thumbnail of BOOTH and detected the face with OpenCV, the accuracy was too good and I was scared.
Upload data to s3 of aws with a command and update it, and delete the used data (on the way)