[PYTHON] Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5

Introduction

I suddenly started studying "Deep Learning from scratch ❷ --- Natural language processing" Note that I stumbled in Chapter 5. is.

The execution environment is macOS Catalina + Anaconda 2019.10, and the Python version is 3.7.4. For details, refer to Chapter 1 of this memo.

Chapter 5 Recurrent Neural Networks (RNN)

This chapter describes recurrent neural networks.

5.1 Probability and language model

It explains the language model and the problems when trying to use CBOW as a language model. I think that the reason why Equation 5.8 is an approximation is that CBOW ignores the word sequence.

Since word2vec ignores the sequence of words, it seems that the RNN learned in this chapter is better if it is used for distributed expression, but the RNN was born first, and later to increase the number of vocabulary and improve the quality. It is interesting that word2vec was proposed by, and it was actually the opposite flow.

5.2 What is RNN?

This is an explanation of RNN. The tanh function (hyperbolic tangent function) appears as an activation function, but for some reason there is no explanation in this book, so let's google for details with "tanh".

A little more worrisome is the support to return to the beginning when the data is used to the end in mini-batch learning. In this case, the end and the beginning of the corpus will be connected. However, in the first place, this book treats the PTB corpus as "one big time series data" and does not even consider the sentence breaks (see the scorpion mark part in the center of P.87). Therefore, it may be meaningless to worry about the end and the beginning being connected.

5.3 Implementation of RNN

The implementation should be a little tricky as Figure 5-19 and Figure 5-20 omit the Repeat node after the bias $ b $. Forward propagation can be implemented as shown in the figure because it is broadcast, but it is necessary to add it consciously when calculating $ db $ by back propagation. Just this part of QA was also in teratail (teratail: why sum the db with axis = 0 in the backpropagation of the RNN).

Also, the tanh function that came out this time is implemented without explanation, but forward propagation can be calculated with numpy.tanh () like the code in the book. For back propagation, the part dt = dh_next * (1 --h_next ** 2) is the derivative of tanh, which is explained in detail in "Appendix A Differentiation of sigmoid and tanh functions" at the end of the book. there is.

Also, on page 205, the story of "... (3-point dot)" appears, which is the same as the "3-point reader" that appeared on page 34. [Chapter 1 of this memo](https://qiita.com/segavvy/items/91be1d4fc66f7e322f25#13-%E3%83%8B%E3%83%A5%E3%83%BC%E3%83%A9%E3 % 83% AB% E3% 83% 8D% E3% 83% 83% E3% 83% 88% E3% 83% AF% E3% 83% BC% E3% 82% AF% E3% 81% AE% E5% AD As I wrote in% A6% E7% BF% 92), it is better to understand the relationship between slices and views of ndarray rather than remembering that it will be overwritten with 3 point dots.

5.4 Implementation of layers that handle time series data

The explanation of the code is omitted, but it is simple and understandable.

The Time Embedding layer (TimeEmbedding class in common / time_layers.py) simply loops through $ T $ of Embedding layers.

In the Time Affine layer (TimeAffine class in common / time_layers.py), instead of looping $ T $ times, the batch size $ N $ is transformed into $ T $ times and calculated at once, and the result is the original. It is made more efficient by transforming it into a shape.

The Time Softmax With Loss layer (TimeSoftmaxWithLoss class in common / time_layers.py) is as explained in the book, but I was worried that a mask using ʻignore_label` was implemented. When the correct label is -1, both the loss and the gradient are set to 0 and it is excluded from the denominator $ T $ when calculating $ L $, but the process of setting the correct label to -1 is now. I don't think it was there. I may use it in a later chapter, so I will leave it for the time being.

5.5 RNNLM learning and evaluation

Unfortunately, this implementation does not give good results when using the entire PTB dataset, so I played with the previous chapter Aozora Bunko's divided text I also stopped studying at. It is said that it will be improved in the next chapter, so I will try it there.

As an aside, when I saw the code rn = np.random.randn inSimpleRnnlm.__ init__ (), I found it useful because Python can easily put functions into variables and use them. In C language, it is complicated to put a function in a variable (put a function entry point in a variable) with a lot of * and (), and it is complicated to use it, and I am not really good at it in my active days. I did: sweat:

5.6 Summary

I have managed to handle time series data.

That's all for this chapter. If you have any mistakes, I would be grateful if you could point them out.

Recommended Posts

Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 6
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 7
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Deep Learning from scratch
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Deep Learning from scratch Chapter 2 Perceptron (reading memo)
Deep Learning from scratch 1-3 chapters
Deep Learning / Deep Learning from Zero 2 Chapter 4 Memo
Deep Learning / Deep Learning from Zero Chapter 3 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Create an environment for "Deep Learning from scratch" with Docker
Deep learning from scratch (cost calculation)
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Deep Learning / Deep Learning from Zero Chapter 5 Memo
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
Deep Learning memos made from scratch
Deep Learning / Deep Learning from Zero 2 Chapter 6 Memo
Write an impression of Deep Learning 3 framework edition made from scratch
Deep learning from scratch (forward propagation edition)
Deep learning / Deep learning from scratch 2-Try moving GRU
"Deep Learning from scratch" in Haskell (unfinished)
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
Python vs Ruby "Deep Learning from scratch" Chapter 2 Logic circuit by Perceptron
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
"Deep Learning from scratch" self-study memo (unreadable glossary)
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
An amateur tried Deep Learning using Caffe (Introduction)
Good book "Deep Learning from scratch" on GitHub
An amateur tried Deep Learning using Caffe (Practice)
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
An amateur tried Deep Learning using Caffe (Overview)
Python vs Ruby "Deep Learning from scratch" Summary
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Lua version Deep Learning from scratch Part 5.5 [Making pkl files available in Lua Torch]
[Deep Learning from scratch] I implemented the Affine layer
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Application of Deep Learning 2 made from scratch Spam filter