[PYTHON] Code reading what you're doing with Theano's LSTM Tutorial

http://deeplearning.net/tutorial/lstm.html

A note when I briefly read what this sample is doing inside. To be honest, this implementation part of LSTM cannot be read at all.

The reading condition is such that it can be tampered with so that it can be verified freely to some extent.

Classification of training data

As a whole code, there are three types of training data to be input.

Code reading

imdb.py

We are preparing data, etc. It is easiest to extend this part when handling original data.

prepare_data() --Takes multiple training samples and returns an array of transposed matrices, values passed as labels, and masks. --Data with more elements than the number specified by maxlen is excluded (not shrunk) ――Even if you want to use your own data, you don't need to mess with it.

load_data() --Preparing train, valid, test data from raw data --Arguments --path: It seems to work like a cache -Downloading data from here --The original data for train and test are separated in the first place, and the data for train is divided into train and valid at the ratio set in valid_portion. --Other options - n_words --Upper limit on the number of vocabulary - sort_by_len --Sort by array length. Is it faster to do it? - maxlen --Same as prepare_data. Data exceeding this will be skipped

imdb_preprocess.py Script for data preparation

--It seems that you are converting to word-> id. --tokenize using perl ――It seems that you have removed some HTML tags. ――It seems that perl is just a historical background. (Honestly, processing that seems to be possible even with python) -It seems that it can support multiple languages, but of course, Japanese that can not be divided without morphological analysis is not supported.

lstm.py Learning execution script

$ python lstm.py

Can be moved with

Main function

train_lstm() Real endpoint. Here, the argument of the function is a parameter that can be tampered with. There are quite a lot. List the things that are likely to be tampered with for the time being.

--Learning related - dim_proj --Number of hidden units. --It takes a long time to try with the default 128. - vaildFreq --A setting value related to the frequency of error rate verification. - patience --Variables related to the timing of early termination. --Roughly speaking, if the pattern with the same result of validFreq continues patience times, Eary stops. - max_epochs --Maximum number of epoch runs - use_dropout --Presence or absence of dropout layer. Default True - optimizer --Optimization function. --The default is AdaGrad. Choose from AdaGrad, RMSprop, SGD ――But "Be careful because SGD is difficult to handle." -If you refer to this article, the default AdaGrad seems to be sufficiently accurate. - decay_c --Weight decay. Weight attenuation. ――I haven't used it properly yet --Learning data related - n_words --Maximum number of vocabulary. Default 10000 - maxlen --Upper limit of the number of elements per learning sample. This is passed to ʻimdbtraining data read --Other -saveto --Output destination of the final result file of the model -reload_model --Start learning with the previously saved model as the initial value --Maybe it's buggy, but it's loading a file calledlstm_model.npz. - dispFreq` --Log display frequency. ――Default is 10, but it is more fun to see the execution speed if you set it to 1.

Other functions that you should know a little

build_model() The part that builds the LSTM model. It is also used to reproduce the model that has been trained with train_lstm.

init_tparams() The parameters are being converted for theano. The parameter to be thrown to build_model needs to be passed through this function.

init_params() Global settings for non-LSTM

pred_error(), pred_probs() Functions that run the model. pred_error is used for error calculation. pred_probs outputs the result. Not used during learning. The two differ in whether they use f_pred or f_pred_prob.

f_pred_prob gives the result of each probability, and f_pred returns the maximum number of elements (= which class it belonged to).

sgd(), adadelta(), rmsprop() Optimization function. The guy to choose with optimizer

param_init_lstm(), lstm_layer() This mounting part of LSTM. The back is still a mystery from this point. However, if you compare the code you are doing with lstm_layer with the formula, you can see that almost the same thing is done.

dropout_layer() Implementation part of dropout.

What's in the pre-built model

model = numpy.load("lstm_model.npz")

It is saved in the numpy data format, so you can load it with numpy.load.

--The last obtained error rate is entered in train_err, valid_err, and test_err, respectively. --history_errs seems to be relatively convenient. The error rate recorded for each validFreq is entered in the form of[valid_err, test_err]. --Other parameters of LSTM learning result

Recommended Posts

Code reading what you're doing with Theano's LSTM Tutorial
Doing Blender's Add-on Tutorial with Hy