[PYTHON] I made RNN learn a sine wave and made a prediction: Hyperparameter adjustment

0. Roughly speaking

1. Configuration

It basically uses the same network configuration as previous article. In addition, the hyperparameters are based on the values shown in "3.3. Hyperparameters".

Variable name meaning value
num_of_input_nodes Number of nodes in the input layer 1 node
num_of_hidden_nodes Number of nodes in the hidden layer 2 nodes
num_of_output_nodes Number of nodes in the output layer 1 node
length_of_sequences RNN sequence length 50 steps
num_of_training_epochs Number of learning repetitions 2,000 times
num_of_prediction_epochs Number of repetitions of prediction 100 times
size_of_mini_batch Number of samples per mini-batch 100 samples
learning_rate Learning rate 0.1
forget_bias (I'm not sure) 1.0 (default value)

2. Source code, notebook

The source code used for learning / prediction, the notebook that generated the learning data, and the notebook used for charting the results are available on GitHub. Please refer to that for specific source code and values.

https://github.com/nayutaya/tensorflow-rnn-sin/tree/20160517/ex2

3. Hyperparameter adjustment

3.1. num_of_hidden_nodes: Number of hidden layer nodes

The chart of the loss function and the prediction result when the number of nodes in the hidden layer is changed from 1 to 4 is shown below. If the number of nodes in the hidden layer is 1, you can see that it is completely unpredictable. Also, if the number of nodes in the hidden layer is large, it seems that good results will not always be obtained. Looking at the loss function chart, the more nodes in the hidden layer, the less the final loss.

No Number of nodes in the hidden layer Learning / predicted time
1 1 3m53.845s
2 2 3m30.844s
3 3 4m36.324s
4 4 5m30.537s

output_hidden.png

loss_hidden.png

3.2. length_of_sequences: RNN sequence length

The chart below shows the prediction results and loss function when the sequence length of the RNN is changed to 30, 40, 50, 60, and 70. The training data this time is a sine wave with 50 steps per cycle, but it can be seen that even if it is less than one cycle, it can be predicted sufficiently.

No RNN sequence length Learning / predicted time
1 30 2m29.589s
2 40 2m58.636s
3 50 3m30.844s
4 60 4m25.459s
5 70 5m38.550s

output_seq.png

loss_seq.png

3.3. num_of_training_epochs: Number of training iterations

The chart of the prediction result and loss function when the number of learning repetitions is changed to 1,000, 2,000, and 3,000 is shown below. In the case of 3,000 times, the result of the loss function oscillates from around 1,600 times. The prediction results are also not good.

No Number of learning repetitions Learning / predicted time
1 1,000 times 2m10.783s
2 2,000 times 3m30.844s
3 3,000 times 6m17.675s

output_epoch.png

loss_epoch.png

3.4. size_of_mini_batch: Number of samples per mini-batch

The chart below shows the prediction results and loss function when the number of samples per mini-batch is changed to 50, 100, and 200. There is no noticeable difference, but basically it seems that the larger the number of samples, the better the results.

No Number of samples per mini-batch Learning / predicted time
1 50 4m25.032s
2 100 3m30.844s
3 200 4m59.550s

output_batch.png

loss_batch.png

3.5. learning_rate: Learning rate

The chart below shows the prediction results and loss function when the learning rate passed to the optimizer is changed to 0.02, 0.1, and 0.5. In the case of learning rates 0.02 and 0.5, it cannot be predicted properly. Also, in the case of the learning rate 0.5, the result of the loss function oscillates immediately after learning.

No Learning rate Learning / predicted time
1 0.02 3m46.852s
2 0.1 3m30.844s
3 0.5 4m39.136s

output_learning.png

loss_learning.png

3.6. forget_bias

Actually, the chart of the loss function and the prediction result when changing the forget_bias parameter of BasicLSTMCell to 0.25, 0.5, 1.0 (default value), which is not well understood, is shown below. indicate. In the case of 0.25, it is not predictable.

No forget_bias Learning / predicted time
1 0.25 4m27.725s
2 0.5 4m27.089s
3 1.0 3m30.844s

output_forget.png

loss_forget.png

3.7. Optimizer

The chart below shows the prediction results and loss function when the optimizer used for optimization is switched from Gradient Descent Optimizer to ʻAdam Optimizer. ʻAdam Optimizer has a faster loss reduction and lower final value, but it vibrates violently. It's difficult to stop learning.

No optimizer Learning / predicted time
1 GradientDescentOptimizer 3m30.844s
2 AdamOptimizer 4m46.116s

output_optimizer.png

loss_optimizer.png

3.8. RNN cell

The chart below shows the prediction results and loss function when the RNN cell is switched from BasicLSTMCell to GRUCell. There was not much difference.

No RNN cell Learning / predicted time
1 BasicLSTMCell 3m30.844s
2 GRUCell 4m53.831s

output_cell.png

loss_cell.png

4. Future plans

I would like to try what would happen if I learned and predicted more realistic data (stock prices, foreign exchange, etc.).

Recommended Posts

I made RNN learn a sine wave and made a prediction: Hyperparameter adjustment
I tried to predict by letting RNN learn the sine wave
〇✕ I made a game
I made a LINE BOT with Python and Heroku
I scraped the Organization member team and made a ranking
Sine wave prediction using RNN in deep learning library Keras
I made a Chatbot using LINE Messaging API and Python
I made a python text
I made a discord bot
[Python3] I made a decorator that declares undefined functions and methods.
I made a simple network camera by combining ESP32-CAM and RTSP.
I made a Chatbot using LINE Messaging API and Python (2) ~ Server ~
I made a chatbot with Tensor2Tensor and this time it worked
I made a C ++ learning site
I made a Line-bot using Python!
I made a CUI-based translation script (2)
I made a fortune with Python.
I made a CUI-based translation script
Prediction of sine wave with keras
I made a daemon with Python
Sine wave prediction (regression) with Pytorch
4/22 prediction of sine wave with keras
I made a network to convert black and white images to color images (pix2pix)
I made a program to input what I ate and display calories and sugar