[PYTHON] "Deep Learning from scratch" self-study memo (No. 15) TensorFlow beginner tutorial

While reading "Deep Learning from scratch" (written by Yasuki Saito, published by O'Reilly Japan), I will make a note of the sites I referred to. Part 14 ← → Part 16

Since Google Colab can be used normally, I will try the contents of this book with TensorFlow.

TensorFlow beginner tutorial

TensorFlow site https://www.tensorflow.org/?hl=ja tutorial for beginners "First Neural Network" I moved it as it was.

Really, if you copy and paste it into Colab, it will work, so you don't have to make any notes. However, there is no explanation about the contents because it is for confirming that it works. The necessary explanations seem to be scattered around the site, but beginners are likely to get lost after doing the tutorial, not knowing what they have done now or what to do next.

Or say,

Perhaps this site isn't for beginners at all, but for someone who has some knowledge of python and neural networks to get TensorFlow to work with Google Colab. Actually, I can't find any explanation about the neural network itself.

In that respect, I think it is necessary to have a book that accumulates explanations in order, such as "Deep Learning from scratch."

We'll compare the scripts we ran in this tutorial with the content of "Deep Learning from scratch" to see what keras and TensorFlow are doing.

Model building

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

Here, we are building sequential model of keras.

type(model)

tensorflow.python.keras.engine.sequential.Sequential

model.summary()

Model: "sequential" Layer (type) 　　　　　 Output Shape 　　　　　 Param #

flatten (Flatten)　　　　　 (None, 784) 　　　　　 0
dense (Dense) 　　　　　 (None, 128) 　　　　　100480
dense_1 (Dense) 　　　　　(None, 10) 　　　　　1290

Total params: 101,770 Trainable params: 101,770 Non-trainable params: 0

The output (, 784) from the input layer is In the next dense layer, the weight of (784, 128) and the dot product are added, and (, 128) is output. It is dot producted with the weight of (128, 10) in the next dense-1 layer, and (, 10) is output.

model.layers

[tensorflow.python.keras.layers.core.Flatten at 0x7faf1e9efac8, tensorflow.python.keras.layers.core.Dense at 0x7faeea73b438, tensorflow.python.keras.layers.core.Dense at 0x7faeea73b710]

Build a model with the Sequential class defined in the module tensorflow.python.keras.engine.sequential.py. The first layer is the input layer, which seems to just "convert from a 2D array (28x28 pixels) to a 1D array of 28x28 = 784 pixels".

After the pixels are one-dimensionalized, the network will have two tf.keras.layers.Dense layers. These layers are layers of tightly or fully connected neurons. The first Dense layer has 128 nodes (or neurons). The second layer, which is also the last layer, is the 10-node softmax layer. This layer returns an array of 10 probabilities that add up to 1. Each node outputs the probability that the image you are looking at belongs to each of the 10 classes.

Therefore, it seems to correspond to the two-layer neural network class TwoLayerNet explained on page 113 of the book "Deep Learning from scratch". However, the activation function of the TwoLayerNet class was the sigmoid function, but in this keras sequential model, the Relu function is specified.

Dense It seems to be a fully connected layer in the sense that it is "dense, (...) dense, and dense".

"Deep Learning from scratch" P205 There was a connection between all the neurons in the adjacent layer. We call this fully-connected and we have implemented the fully-connected layer under the name Affine layer.

It seems to correspond to.

Dense parameters

units, number of dimensions of output activation = None, activation function to use (if not specified, do not use activation function) use_bias = True, whether to use bias kernel_initializer ='glorot_uniform', initial weight bias_initializer ='zeros', initial value of bias kernel_regularizer = None, regularization function applied to weight matrix bias_regularizer = None, regularization function applied to bias activity_regularizer = None, regularization function applied to layer output (activation) kernel_constraint = None, constraint function bias_constraint = None, constraint function **kwargs

Activation function

softmax elu selu softplus softsign relu tanh sigmoid hard_sigmoid linear

About half of them were explained in "Deep Learning from scratch". linear is the identity function of P66.

Initial value of weight

glorot_uniform Default value Returns the initialization with the uniform distribution of Glorot (uniform distribution of Xavier).

From Keras Documentation This is the same as the uniform distribution with [limit, -limit] as the limit when sqrt (6 / (fan_in + fan_out)). Here, fan_in is the number of input units and fan_out is the number of output units.

glorot_normal Returns the initialization by Glorot's normal distribution (with Xavier's normal distribution) "Deep Learning from scratch" p182 "Initial value of Xavier" Gaussian distribution with $ \ frac {1} {\ sqrt n} $ as the standard deviation when the number of nodes in the previous layer is n

he_normal Returns the normal distribution initialization of He "Deep Learning from scratch" p184 "Initial value of He" Gaussian distribution with $ \ sqrt \ frac {2} {n} $ as the standard deviation when the number of nodes in the previous layer is n

random_normal Initialize weights according to normal distribution. If specified as a character string, it will be initialized with the settings of mean = 0.0, stddev = 0.05, and seed = None (Gaussian distribution with standard deviation of 0.05). If you want to set the parameter, specify it with the function as follows. keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None)

"Deep Learning from scratch" P184 Since the sigmoid function and tanh function are symmetrical and can be regarded as a linear function near the center, "Initial value of Xavier" is suitable. On the other hand, when using ReLU, specialize in ReLU. It is recommended to use the initial value. It is an early stage recommended by Kaiming He et al. Value-The name is also "Initial value of He".

Parameters that can be specified by the model.compile method

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

When compiling the model, adam is specified for the optimizer and sparse_categorical_crossentropy is specified for the loss function.

optimizer ='rmsprop', optimization default is rmsprop loss = None, loss function metrics = None, evaluation function loss_weights=None, weighted_metrics=None, run_eagerly=None, **kwargs

Optimizer optimizer

SGD RMSprop Adagrad Adadelta Adam Adamax Nadam TFOptimizer

Loss function loss

mean_squared_error mean_absolute_error mean_absolute_percentage_error mean_squared_logarithmic_error squared_hinge hinge categorical_hinge logcosh categorical_crossentropy sparse_categorical_crossentropy binary_crossentropy kullback_leibler_divergence poisson cosine_proximity

mean ~ is the loss function used in regression problems ~ Crossentropy is a loss function used in classification problems and is explained in "Deep Learning from scratch" P89. It seems that categorical_crossentropy is used when the correct label is a one-hot expression, and sparse_categorical_crossentropy is used when the target value is an integer.

Evaluation function metrics

According to the comments in the compile method

Normally, use metrics = ['accuracy'].

about it. Also,

If you pass the string "accuracy" or "acc", Based on the loss function used and the output shape of the model, this is tf.keras.metrics.BinaryAccuracy, tf.keras.metrics.CategoricalAccuracy、 tf.keras.metrics.SparseCategoricalAccuracy Convert to one of.

about it.

You can also specify metrics = ['mae'], but this seems to be for regression problems and to find Mean Absolute Error. Also, it seems that you can specify a loss function, so don't get confused.

Converts the string "accuracy" ~ to one of ~ tf.keras.metrics.SparseCategoricalAccuracy.

As you can see, it seems that the function itself is actually specified, but if you specify it with a character string identifier, it will be converted. This is also true for optimizations and loss functions. If you specify it as a character string, it will be easier to understand and you will be less likely to make a mistake, but the parameter will be the default value.

For example, if you specify'adam'in the optimizer, it calls the class Adam in the module tensorflow.python.keras.optimizer_v2.adam.py, but the following parameters are set by default. learning_rate=0.001, beta_1=0.9, beta_2=0.999,

"Deep Learning from scratch" P175 Adam sets three hyperparameters. One is the learning coefficient so far (appeared as α in the paper). The latter two are the coefficient β 1 for the primary moment and the coefficient β 2 for the secondary moment. According to the paper, the standard settings are 0.9 for β1 and 0.999 for β2, and that setting seems to work in most cases.

So, it seems that there is no problem with the default value, but if you want to change this, you have to specify the function directly.

model.compile(optimizer=keras.optimizers.Adam(0.001, 0.9, 0.999), 
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])

Model training

model.fit(train_images, train_labels, epochs=5)

Train (learn) with the model.fit method. Specify the image and label of the training data. epochs The number of epochs is specified as 5. By repeating the learning 5 times, when you actually execute it, the execution result of 5 lines will be displayed as shown below.

Epoch 1/5 1875/1875 [==============================] - 2s 1ms/step - loss: 1.5450 - accuracy: 0.6806 Epoch 2/5 1875/1875 [==============================] - 2s 1ms/step - loss: 0.7987 - accuracy: 0.8338 Epoch 3/5 1875/1875 [==============================] - 2s 1ms/step - loss: 0.5804 - accuracy: 0.8666 Epoch 4/5 1875/1875 [==============================] - 2s 1ms/step - loss: 0.4854 - accuracy: 0.8804 Epoch 5/5 1875/1875 [==============================] - 2s 1ms/step - loss: 0.4319 - accuracy: 0.8893 tensorflow.python.keras.callbacks.History at 0x7fd89467f550

The parameters of the fit method are as follows x = None, training data (image) y = None, training data (label) batch_size = None, batch size Number of data to be processed in one batch The default is 32 epochs = 1, number of learning verbose = 1, log output specification 0: not output 1, 2: output callbacks=None, validation_split=0., validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False

Since there are 60,000 training data and the default batch size is 32, iter_per_epoch The number of iterations is 60000 ÷ 32 = 1875 If you batch process (learn) 1875 pieces in one epoch, you will see all the training data. This is repeated 5 times.

Evaluation of correct answer rate

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

313/313 - 0s - loss: 0.3390 - accuracy: 0.8781 Test accuracy: 0.8780999779701233

Parameters of model.evaluate method

x = None, test data (image) y = None, test data (label) batch_size = None, batch size default is 32 verbose = 1, log output specification 0: do not output 1, 2: output sample_weight=None, steps=None, callbacks=None, max_queue_size=10, workers=1, use_multiprocessing=False, return_dict=False

Returns the test data loss value and correct answer rate in the trained model

On the TensorFlow site, I'm not sure what the parameters are, so I referred to this site. The explanation is not enough, but you can see what you can specify. →Keras Documentation、KerasAPIref

Part 14 ← → Part 16 Click here for the table of contents of the memo Unreadable Glossary