For those who want to try this or that for deep learning modeling but don't know how to implement it Using keras's functional API as a framework that is relatively flexible and reasonably abstracted Try to implement seq2seq, which is difficult with sequential, as simply as possible
It turns out that deep learning can be implemented using keras. What kind of code should I write specifically? I will answer the question.
When using keras, it is convenient to utilize the Model class. https://keras.io/ja/models/model/
The Model
class is responsible for defining learning methods, executing learning, and inferring in a model determined by learning.
In order to create a Model
instance, it is necessary to create a calculation graph for the machine learning model in advance.
There are two options for this: sequential API and functional API.
The sequential API is very simple and is useful when the processing of the previous layer becomes the processing of the next layer as it is.
Instead, it sacrifices model flexibility and cannot be used with increasing complexity, such as multi-input, multi-output models.
Compared to the sequential API, the functional API requires you to define the connection between layers by yourself, but you can write it more flexibly.
This time, we will create a model using the functional API.
Once you've built a computational graph and created a Model
instance, the rest is easy
Definition of learning method with compile
method of Model
instance (optimization method, loss function setting, etc.)
You can execute learning with the fit
method.
We will build the model shown in the following figure
There are two things to do with the encoder: embedding the input and inputting to the LSTM. Implementation example is as follows
from keras.layers import Input, LSTM, Dense, Embedding
# Define an input sequence and process it.
encoder_inputs = Input(shape=(max_length_inp,),name='encoder_input')
encoder_inputs_embedding = Embedding(input_dim=vocab_inp_size, output_dim=embedding_dim)(encoder_inputs)
encoder = LSTM(units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
Input to the model is always done from the ʻInput layer. Here, from ʻInput
, the maximum length max_length_inp
dimension data of the input character string is input at once.
RNN-based algorithms process input data strings one by one and pass them to the next step in sequence, but it is also possible to abbreviate them in this way.
encoder_inputs_embedding = Embedding(input_dim=vocab_inp_size, output_dim=embedding_dim)(encoder_inputs)
Means
"Define the ʻEmbedding layer with ʻinput_dim = vocal_inp_size, output_dim = embedding_dim
"
"Add a calculated graph so that the result of substituting ʻencoder_inputs for the defined ʻEmbedding
is ʻencoder_inputs_embedding`"
It means that.
encoder = LSTM(units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs_embedding)
You can also define the layer and add it to the calculation graph in a separate line, as in.
There are three things to do with the decoder: Embedding (for teacher forcing), LSTM, and Dense for the decoder input. Implementation example is as follows
from keras.layers import Input, LSTM, Dense, Embedding
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_length_targ-1,),name='decoder_input')
decoder_inputs_embedding = Embedding(input_dim=vocab_tar_size, output_dim=embedding_dim)(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs_embedding,
initial_state=encoder_states)
decoder_dense = Dense(vocab_tar_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
What is different from the encoder
has one less
shape`.return_sequences = True
option of LSTM
to get the LSTM
output for each stepLSTM
receives LSTM
hidden layer memory ʻencoder_states` obtained from the encoderDense
layer. Since it is an output layer, ʻactivation is
softmax`If you come to this point, the rest is straight forward
from keras.models import Model
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
from IPython.display import SVG
SVG(model_to_dot(model).create(prog='dot', format='svg'))
You can visualize the calculated graph that represents the model with.
The positional relationship on the drawing of each layer is different from the figure shown at the beginning, but you can see that it is the same as a network.
Also,
model.summary()
You can check the number of parameters for each layer with.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
encoder_input (InputLayer) (None, 18) 0
__________________________________________________________________________________________________
decoder_input (InputLayer) (None, 17) 0
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 18, 256) 1699328 encoder_input[0][0]
__________________________________________________________________________________________________
embedding_2 (Embedding) (None, 17, 256) 2247168 decoder_input[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 1024), (None 5246976 embedding_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, 17, 1024), ( 5246976 embedding_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 17, 8778) 8997450 lstm_2[0][0]
==================================================================================================
Total params: 23,437,898
Trainable params: 23,437,898
Non-trainable params: 0
__________________________________________________________________________________________________
During actual modeling, it is recommended to visualize it as appropriate because debugging will progress.
I would like to use Adam to optimize the loss function as cross entropy. Let's look at the word accuracy for each epoch.
I want to save the model every 5 epochs.
Implementation example is as follows
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
# define save condition
dir_path = 'saved_models/LSTM/'
save_every = 5
train_schedule = [save_every for i in range(divmod(epochs,save_every)[0])]
if divmod(epochs,save_every)[1] != 0:
train_schedule += [divmod(epochs,save_every)[1]]
#run training
total_epochs = 0
for epoch in train_schedule:
history = model.fit([encoder_input_tensor, decoder_input_tensor],
np.apply_along_axis(lambda x: np_utils.to_categorical(x,num_classes=vocab_tar_size), 1, decoder_target_tensor),
batch_size=batch_size,
epochs=epoch,
validation_split=0.2)
total_epochs += epoch
filename = str(total_epochs) + 'epochs_LSTM.h5'
model.save(dir_path+filename)
I'm doing various things, but the only ones are model.compile
and model.fit
. I think that only these two are enough for the minimum.
Pass the optimization method, loss function, and metric for evaluation to model.compile
as options.
Then, learning will be executed with model.fit
.
The most important parameters given to model.fit
are the input data and the correct answer data.
The correct answer data is np.apply_along_axis (lambda x: np_utils.to_categorical (x, num_classes = vocal_tar_size), 1, decoder_target_tensor)
This is because I want to convert each element of decoder_target_tensor
to a one-hot encoded format.
Bugs can be found quickly by making appropriate visualizations to check the consistency of dimensions, or by substituting specific values as appropriate. Since each layer can be treated like a function, you can get the output of the concrete value by substituting the concrete value.
The pretreatment part is as follows Neural machine translation with attention https://www.tensorflow.org/tutorials/text/nmt_with_attention
The code base for the learning / inference part is as follows Sequence to sequence example in Keras (character-level). https://keras.io/examples/lstm_seq2seq/
The data used for learning is as follows https://github.com/odashi/small_parallel_enja
Repository containing the code for this article https://github.com/nagiton/simple_NMT
Recommended Posts