Introduction

In deep learning, it seems that it is common to randomly shuffle the learning data of the mini batch for each epoch, so I checked the effect.

environment

python: 2.7.6
chainer: 1.8.0

content of study

For each epoch, check the difference between when the learning data of the mini-batch is fixed and when it is random. The training is the same sin function as in I tried learning the sin function with chainer.

[training data]

input: theta (0 ~ 2π, 1000 divisions)
output: sin(theta)

Implementation

Fixed or random

The learning data of the mini-batch is switched between fixed and random. The fixed code is the same technique that is often used when testing.

`Fixed or random`


perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
    if order == "fixed": #The order of learning is fixed
        x_batch = x_train[i:i + batchsize]
        y_batch = y_train[i:i + batchsize]
    elif order == "random": #Random order of learning
        x_batch = x_train[perm[i:i + batchsize]]
        y_batch = y_train[perm[i:i + batchsize]]

Learning parameters

Mini batch size: 20
Epoch (n_epoch): 500
Number of hidden layers: 2
Number of hidden layer units (n_units): 100
Activation function: Rectifier (relu)
Dropout: None (0%)
Optimization: Adam
Loss Error Function: Mean Squared Error Function (mean_squared_error)

All parameters are appropriate as in the example.

Whole code

`The entire`


# -*- coding: utf-8 -*-

#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt

#data
def get_dataset(N):
    x = np.linspace(0, 2 * np.pi, N)
    y = np.sin(x)
    return x, y

#neural network
class MyChain(Chain):
    def __init__(self, n_units=10):
        super(MyChain, self).__init__(
             l1=L.Linear(1, n_units),
             l2=L.Linear(n_units, n_units),
             l3=L.Linear(n_units, 1))

    def __call__(self, x_data, y_data):
        x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
        y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
        return F.mean_squared_error(self.predict(x), y)

    def  predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        h3 = self.l3(h2)
        return h3

    def get_predata(self, x):
        return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data

# main
if __name__ == "__main__":

    #Training data
    N = 1000
    x_train, y_train = get_dataset(N)

    #Learning parameters
    batchsize = 10
    n_epoch = 500
    n_units = 100

    #Learning loop
    fixed_losses =[]
    random_losses =[]
    print "start..."
    for order in ["fixed", "random"]:
        #Modeling
        model = MyChain(n_units)
        optimizer = optimizers.Adam()
        optimizer.setup(model)

        start_time = time.time()
        for epoch in range(1, n_epoch + 1):

            # training
            perm = np.random.permutation(N)
            sum_loss = 0
            for i in range(0, N, batchsize):
                if order == "fixed": #The order of learning is fixed
                    x_batch = x_train[i:i + batchsize]
                    y_batch = y_train[i:i + batchsize]
                elif order == "random": #Random order of learning
                    x_batch = x_train[perm[i:i + batchsize]]
                    y_batch = y_train[perm[i:i + batchsize]]

                model.zerograds()
                loss = model(x_batch,y_batch)
                sum_loss += loss.data * batchsize
                loss.backward()
                optimizer.update()

            average_loss = sum_loss / N
            if order == "fixed":
                fixed_losses.append(average_loss)
            elif order == "random":
                random_losses.append(average_loss)

            #Output learning process
            if epoch % 10 == 0:
                print "({}) epoch: {}/{} loss: {}".format(order, epoch, n_epoch, average_loss)

        interval = int(time.time() - start_time)
        print "Execution time({}): {}sec".format(order, interval)

    print "end"

    #Graphing the error
    plt.plot(fixed_losses, label = "fixed_loss")
    plt.plot(random_losses, label = "random_loss")
    plt.yscale('log')
    plt.legend()
    plt.grid(True)
    plt.title("loss")
    plt.xlabel("epoch")
    plt.ylabel("loss")
    plt.show()

Execution result

error

There was a difference of about 10 times in the error between fixed time and random time. It is better to make it random according to the previous reputation.

If you shuffle the entire data

We also confirmed that the entire training data was shuffled at random. When the entire training data was shuffled at random, the result was comparable to that at random when fixed. Rather, the variation for each epoch is better when fixed.

`Random entire data`


def get_dataset(N):
    x = 2 * np.pi * np.random.random(N)
    y = np.sin(x)
    return x, y

Summary

When training the sin function, the error is about 10 times better than when the training data of the mini-batch is randomly shuffled for each epoch. However, randomizing the entire training data gave good results in both cases.

It may be limited to specific conditions such as the sin function, but it is also worth trying a method of randomly shuffling the entire data and fixing the training data of the mini-batch.

reference

Neural network starting with Chainer

Basics and Practice of Deep Learning Implementation

[PYTHON] Confirmed the difference in the presence or absence of random processing during mini-batch learning with chainer

Introduction

environment

content of study

Implementation

Fixed or random

Fixed or random

Learning parameters

Whole code

The entire

Execution result

error

If you shuffle the entire data

Random entire data

Summary

reference

`Fixed or random`

`The entire`

`Random entire data`