Click here for Keras lottery starting from nothing [http://qiita.com/Ishotihadus/items/6ecf5684c2cbaaa6a5ef)

From this time on, the Keras version is 2.0.5, but it shouldn't be a problem.

Review up to the last time

Up to previous, Keras was used for two-class classification.

MNIST

A dataset that anyone who has ever touched machine learning has heard of. Abbreviation for Modified National Institute of Standards and Technology database.

The number 0 to 9 is written on the monochrome image, and the purpose is to recognize which number from the image.

28 x 28 pixel, 8bit monochrome. Of course with a label. Training data is 60,000 and test data is 10,000.

The official website is here. There is also a record of error rates for each method. Looking at this, I feel that neural networks are not particularly amazing.

Keras has this MNIST available by default. Convenient.

Try to recognize MNIST

Let's study on an ordinary network for the time being.

Data reading

Keras's MNIST data is an integer between 0 and 255 for each pixel. This is awkward, so divide by 255.0 to fit between 0 and 1.

Also, since it is two-dimensional data (three-dimensional if the data direction is included), each is made one-dimensional (two-dimensional if the data direction is included) by reshape. reshape is a function that changes only the shape of an array while keeping the order and total number of data.

The label is an array of integers from 0 to 9, but this is also transformed to be one hot. For example, if it is 7, it should be [0,0,0,0,0,0,0,1,0,0]. Note that y_test does not need to be done, so it is omitted.

You may download the data when you do mnist.load_data ().

from keras.datasets import mnist
from keras.utils import np_utils

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784) / 255.0
x_test = x_test.reshape(10000, 784) / 255.0
y_train = np_utils.to_categorical(y_train, num_classes=10)

Model settings

Make a model of the network. Here, a two-layer neural network of 784-1300-10 is created. There are various ways to count layers, but in Keras without Keras, the process up to "receiving data from the previous layer and performing some processing (such as applying an activation function)" is counted as one layer. Therefore, the input layer is not counted as a layer.

from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout

model = Sequential([
    Dense(1300, input_dim=784, activation='relu'),
    Dropout(0.4),
    Dense(10, activation='softmax')
])
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])

Dropout is a dropout layer. Keras randomly selects a specified percentage of nodes (40% in this case) from the nodes in the previous layer (Dense layer with 1300 nodes in this case) and sets the output to 0. This has the effect of preventing overfitting.

Learning

from keras.callbacks import EarlyStopping

es = EarlyStopping(monitor='val_acc')
model.fit(x_train, y_train, batch_size=100, validation_split=0.2, callbacks=[es])

EarlyStopping

Here, a new one called Early Stopping is used. Learning can be stopped automatically at an appropriate place to prevent overfitting.

Specify what to observe in monitor. I chose val_acc (validation accuracy), but val_loss (validation loss) is more common (?).

Batch size

The value of batch_size in the argument of fit above.

The batch size is the number of data calculated at one time. It can be equal to the total number of training data, that is, 48000 in this example (validation cuts 20%). If the batch size is reduced, for example, 100 pieces, only 100 pieces are handled in one calculation, so in this case, 480 pieces are calculated in total. This 480-time calculation 1 unit is called an epoch. If the batch size is much smaller than the number of data, it is called a mini-batch (learning). By the way, when the batch size is 1, it is called stochastic (it seems to be unfamiliar in Japan).

By reducing the batch size, the amount of memory used is reduced. Also, since the number of parameter updates per epoch increases, convergence will be faster. However, if the batch size is made too small, there is a drawback that the parameters become violent. Therefore, the batch size should be chosen nicely.

By the way, if you do not specify the batch size, it will be the total number of data.

test

predict = model.predict_classes(x_test)
print(sum(predict == y_test) / 10000.0)

The correct answer rate should be about 98%.

predict_classes is the one that predicts classes.

Summary of this program

Play around with parameters such as dropout rate and batch size.

from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.callbacks import EarlyStopping

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784) / 255.0
x_test = x_test.reshape(10000, 784) / 255.0
y_train = np_utils.to_categorical(y_train, num_classes=10)

model = Sequential([
    Dense(1300, input_dim=784, activation='relu'),
    Dropout(0.4),
    Dense(10, activation='softmax')
])
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])

es = EarlyStopping(monitor='val_acc')
model.fit(x_train, y_train, batch_size=100, validation_split=0.2, callbacks=[es])

predict = model.predict_classes(x_test)
print(sum(predict == y_test) / 10000.0)

Actually, I tried to deal with CNN this time and compare the performance with the one shown above, but it became long, so I decided to do it next time. I will do it next time.

[PYTHON] Keras 5th starting from nothing