This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 13, make a note of your own points.
--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server
In the previous chapter, we constructed a convolutional neural network (CNN) with a sequence of distributed expressions of words arranged in a form corresponding to a sentence as input. In this chapter, we will build a recurrent neural network (RNN) that uses a sequence of distributed expressions of words arranged in a form corresponding to a sentence as input. The detailed explanation of the mechanism is omitted.
13.1 Recurrent layer
Input the leftmost column of the feature vector to one layer (fully connected layer) of the multi-layer perceptron. The next neuron to be input is shifted to the right by one row and input to the fully connected layer in the same way, but the weight of the ** fully connected layer used here is the same as the one used before **. At the same time, ** connect the previous output neuron ** through another fully connected layer.
--CNN: Includes information for all columns of feature vectors by inputting a series of outputs into the max pooling layer to get the vector --RNN: Since the previous output is connected to the next output, the one vector obtained at the end contains the information of the entire column of feature vectors (however, the features at the beginning become smaller). Note that it will end up)
It can also be explained by preparing a fully connected layer with a "connection that returns its output to itself" and inputting vectors to it in order. I originally had this image, and when I expanded the loop part, it became the image I mentioned earlier.
13.2 LSTM An abbreviation for long short-term memory, RNN had the problem that the characteristics at the beginning became smaller, but LSTM is an improvement that can retain old information. (I want to summarize LSTM in the future)
Additions / changes from the previous chapter (Step 12)
--Neural network structure: CNN-> RNN --Handling of 0s in sequence: No special-> Special treatment as numbers for zero padding --The layer following the embedding layer must also be supported (LSTM is supported, CNN is not supported)
rnn_sample.py
    model = Sequential()
    model.add(get_keras_embedding(we_model.wv,
                                  input_shape=(MAX_SEQUENCE_LENGTH, ),
                                  mask_zero=True,
                                  trainable=False))
    model.add(LSTM(units=256))
    model.add(Dense(units=128, activation='relu'))
    model.add(Dense(units=n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
Execution result
# CNN
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python cnn_sample.py
Epoch 50/50
917/917 [==============================] - 0s 303us/step - loss: 0.0357 - acc: 0.9924
0.6808510638297872
Epoch 100/100
917/917 [==============================] - 0s 360us/step - loss: 0.0220 - acc: 0.9902
0.6808510638297872
# LSTM
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python rnn_sample.py
Epoch 50/50
917/917 [==============================] - 4s 4ms/step - loss: 0.2530 - acc: 0.9378
0.6063829787234043
Epoch 100/100
917/917 [==============================] - 4s 4ms/step - loss: 0.0815 - acc: 0.9793
0.5851063829787234
# Bi-directional RNN
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python bid_rnn_sample.py
Epoch 50/50
917/917 [==============================] - 2s 2ms/step - loss: 0.2107 - acc: 0.9487
0.5851063829787234
Epoch 100/100
917/917 [==============================] - 2s 2ms/step - loss: 0.0394 - acc: 0.9858
0.5851063829787234
# GRU
Epoch 50/50
917/917 [==============================] - 1s 1ms/step - loss: 0.2947 - acc: 0.9368
0.4787234042553192
Epoch 100/100
917/917 [==============================] - 1s 1ms/step - loss: 0.0323 - acc: 0.9869
0.5531914893617021
Compare with 50 Epoch numbers. Other than CNN, the loss function did not drop even with Epoch50, so I verified it with Epoch100.
| Type of NN | Execution result | Execution speed | 
|---|---|---|
| CNN | Epoch50:68.1% Epoch100:68.1% | Average 300us/step -> 0.27s/epoch | 
| LSTM | Epoch50:60.6% Epoch100:58.5% | 4ms on average/step -> 3.6s/epoch | 
| Bi-directional RNN | Epoch50:58.5% Epoch100:58.5% | Average 2ms/step -> 1.8s/epoch | 
| GRU | Epoch50:47.9% Epoch100:55.3% | Average 1ms/step -> 0.9s/epoch | 
Neural network tuning such as hyperparameter search in the following chapters is required, but CNN is fast and the discrimination rate is quite good.
--Normal implementation (Step01): 37.2% --Addition of preprocessing (Step02): 43.6% --Pre-processing + feature extraction change (Step04): 58.5% --Pretreatment + feature extraction change + classifier change RandomForest (Step06): 61.7% --Pre-processing + feature extraction change + classifier change NN (Step09): 66.0% --Pretreatment + feature extraction change (Step 11): 40.4% --Pre-processing + feature extraction change + classifier change CNN (Step12): 68.1% --Pretreatment + feature extraction change + classifier change RNN (Step13): 60.6%
Since a simple RNN that simply diverted a fully connected layer similar to a multi-layer perceptron does not work well, we introduced LSTM.
The contents of Chapter 3 of this book are elementary and focus on how to use them for practical applications. In order to gain a deeper understanding of the theory, we should solidify the theory of neural networks before we start. It might be a good idea to try the Kaggle competition.