Aidemy　2020/10/1

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the second post of Deep Running. Nice to meet you.

This article is a summary of what you learned in "Aidemy" "in your own words". It may contain mistakes and misunderstandings. Please note.

What to learn this time ・ About hyperparameters of deep learning

Deep learning hyperparameters

Types of hyperparameters for deep learning

・ Hyperparameters of add ・ __ Dropout rate : Dropout (rate =) - Number of hidden layer units __: Dense () ・ __Activation function __: Activation () ・ Hyperparameters of compile ・ __ Loss function : loss - Optimization function __: optimizer ・ __ Learning rate __: optimizers.SGD (lr =) ・ Fit hyperparameters ・ __Batch size __: batch_size ・ __ Number of epochs __: epochs

Number of hidden layers and prediction accuracy (dropout)

-Although the number of hidden layers and the number of units can be freely determined, care must be taken because there is a risk that learning will be delayed or overfitting will occur easily if the number is too large. -If __Dropout (rate = ratio of units to be deleted) __ mentioned above is specified as a hyperparameter, learning will be performed while deleting a fixed ratio of units (neurons). Dropouts allow learning that is independent of specific neurons, preventing overfitting and improving model accuracy.

Activation function

-Activation is a function applied for __ fully connected layer output (neuron firing) __. If the activation function is not specified, the data cannot be separated by a straight line and the data cannot be classified. -Conversely, if the activation function is specified, even a model that cannot be linearly separated can be classified without fail if it can be properly learned.

-For the activation function, __ "sigmoid function (sigmoid)" __ that outputs the input value in the range of "0 to 1", 0 is output if the input value is less than 0, and 0 is input if it is 0 or more. There is __ "ReLU function (relu)" __ that outputs the value as it is.

Loss function

-The function that shows the difference between the output data and the teacher data is called the __loss function (loss) __. -Loss functions used in machine learning include __ "(mean) squared error" __ and __ "cross entropy error" __. Details will be described later. -(Review) In deep learning, the weight of each layer is updated so as to minimize this loss function (error back propagation method).

Mean squared error

-A loss function that squares and averages the difference between each output data and teacher data. -Since the mean square error is suitable for evaluation of continuous values, __ mainly applied to regression models __.

Cross entropy error (categorical_crossentropy)

-A loss function that expresses the error between the teacher data of the correct answer label and the output data by 0 to 1 by using the fact that the output of the correct answer label is 1. (The closer to 0, the smaller the error) -Cross entropy error is __ mainly applied to the classification model (binary classification) __.

Optimization function

・ As mentioned above, the weights are updated and learned so that the loss function is minimized. At this time, how to update the weights such as __learning rate, number of epochs, and past weight updates. It is the optimizer that is used to determine whether to reflect or __. What is set in the optimizer is "optimizers" that set the learning rate described later.

Learning rate

-The learning rate (lr) is a hyperparameter that determines how much the weight of each layer is changed at one time. -If the learning rate is too low, the update will hardly proceed, and if it is too high, the value will be scattered and the update will be wasted, or in some cases it will not converge, so it is necessary to set it to an appropriate value.

Batch size

-Batch size is the number of data to be input to the __ model at one time __. If you pass multiple data at once, the weight is updated by taking the average value of the loss of each data and the gradient of the loss function, so the influence of the biased data can be reduced __ and other parallel calculations for each data. Since it can be done, __ calculation time can be shortened __. -On the other hand, if multiple data are passed, it is difficult to update large weights, and there is a possibility of generating local solution that is optimized for only some data. -For this reason, when there is a lot of irregular data, the batch size is often increased to reduce the influence of biased data, and when there is little, the batch size is often reduced to avoid local solutions. ・ Batch size is set to 1, that is, learning to input data one at a time is online learning, learning to set batch size to the total number of data is __batch learning (batch_size) __, and so on. Setting to is called mini-batch learning.

Iterative learning (number of epochs)

・ Generally, in deep learning, learning is repeated several times with the same training data to improve accuracy. The number of learnings at this time is called epochs. Even if the number of learnings is large, the accuracy will not increase after a certain amount, but if learning more than necessary, overfitting will occur, so it is necessary to set the epoch number to an appropriate value.

Summary

-For deep learning hyperparameters, __Dropout (rate =) __ indicating the dropout ratio, __Dense () __ indicating the number of hidden layer units, and __Activation indicating the activation function are set in add. There is () __ ·. -The number of units in the dropout and hidden layers is related to the occurrence of overfitting, and if the activation function is not set, the data classification itself cannot be performed, so it is necessary to set an appropriate value. -There are loss indicating the loss function and optimizer indicating the optimization function to be set in compile. -For the loss function, __ "cross entropy error (categorical_crossentropy)" __ is used in the classification model. The optimization function is related to how the weights are updated, and sets __optimizers.SGD (lr =) __ indicating the learning rate. -The learning rate is the size of the weight that can be changed at one time, and if this is not set appropriately, learning will be wasted or the progress will be slowed down. -There are batch_size indicating the batch size and epochs indicating the number of epochs to be set in fit. The batch size represents the number of data input to the model at one time, and the number of epochs represents the number of trainings. These values vary from model to model.

This time is over. Thank you for reading until the end.

[PYTHON] Deep running 2 Tuning of deep learning