For those who want to try this or that for deep learning modeling but don't know how to implement it Using keras's functional API as a framework that is relatively flexible and reasonably abstracted Try to implement seq2seq, which is difficult with sequential, as simply as possible
I was able to learn all the way, but the data flow at the time of learning and the data flow at the time of inference are a little different. How can I make inferences using the parameters obtained from learning? I will answer the question.
First, let's load the model that has been trained and saved once. Since the data flow is different during inference, it is necessary to define a model with a different calculation graph than during training.
Also, in order to realize the process of predicting the next word from the previous word. Use the defined model like a function in a loop to infer sequentially.
Models saved in h5 files etc. can be loaded as follows.
model = keras.models.load_model(filepath)
In addition, pickle seems to be deprecated.
We will build the model shown in the following figure
The same encoder as at the time of learning can be used as it is.
#define encoder
encoder_model = Model(inputs=model.input[0], #encoder_input
outputs=model.get_layer('lstm_1').output[1:]) #enconder lstm hidden state
If it can be used as it is, it is possible to extract the output in the middle of Model
in this way.
The decoder has a little longer code.
There are three things to do with the decoder: Embedding (for teacher forcing), LSTM, and Dense of the previous word.
Embedding, LSTM, and Dense each have weights during learning, so use those values.
Also, the memory of the hidden layer that should be input to LSTM
is not always the same as the encoder output, but the memory after inferring the previous word, so it needs to be rewritten from the time of learning.
Implementation example is as follows
from keras.layers import Input, LSTM, Dense, Embedding
#define decoder
embedding_dim = 256
units = 1024
vocab_tar_size = model.get_layer('dense_1').weights[1].shape.as_list()[0]
decoder_word_input = Input(shape=(1,),name='decoder_input')
decoder_input_embedding = Embedding(input_dim=vocab_tar_size,
output_dim=embedding_dim,
weights=model.get_layer('embedding_2').get_weights())(decoder_word_input)
decoder_state_input_h = Input(shape=(units,), name='decoder_input_h')
decoder_state_input_c = Input(shape=(units,), name='decoder_input_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = LSTM(units,
return_sequences=False,
return_state=True,
weights=model.get_layer('lstm_2').get_weights())
decoder_output, state_h, state_c = decoder_lstm(decoder_input_embedding,
initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_dense = Dense(vocab_tar_size,
activation='softmax',
weights=model.get_layer('dense_1').get_weights())
decoder_output = decoder_dense(decoder_output)
decoder_model = Model(inputs=[decoder_word_input] + decoder_states_inputs,
outputs=[decoder_output] + decoder_states)
What is different from learning
weights
option is set for each layer. The value to be set can be obtained with model.get_layer (<layer name>). Get_weights ()
.Shape
of Input` is 1.return_sequences = True
option of LSTM
to get the LSTM
output for each stepLSTM
is newly added as an Input layer.Model
class instance decoder_model
SVG(model_to_dot(decoder_model).create(prog='dot', format='svg'))
Actually enter the word ID column to get the translated word ID. To do
is. The implementation example is as follows. (I wrote it so that it can be batch processed, but it is not essential)
def decode_sequence(input_seq, targ_lang, max_length_targ):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
vocab_tar_size = np.array(list(targ_lang.index_word.keys())).max()
inp_batch_size = len(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((inp_batch_size, 1))
# Populate the first character of target sequence with the start character.
target_seq[:, 0] = targ_lang.word_index['<start>']
# Sampling loop for a batch of sequences
decoded_sentence = np.zeros((inp_batch_size, max_length_targ))
for i in range(max_length_targ):
output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens,axis=1) #array of size [inp_batch_size, 1]
decoded_sentence[:,i] = sampled_token_index
# Update the target sequence (of length 1).
target_seq = np.zeros((inp_batch_size, 1))
target_seq[:, 0] = sampled_token_index
# Update states
states_value = [h, c]
return decoded_sentence
Instances of the Model
class have a predict
method.
If you pass the input to the predict
method, the calculation will be performed according to the defined calculation graph and the output will be obtained.
First, we use ʻencoder_model.predict` to encode the input into hidden layer memory.
Assuming that target_seq
, whose size is[batch_size, 1]
, is the previous word, use decoder_model.predict
along with the memory of the hidden layer to remember the next word and the hidden layer to be input next. I'm getting.
The resulting words take ʻargmaxin sequence and take Store it in
decoded_sentence so that the output will be the size
[batch_size, max_length_targ] `.
Execute this loop as many times as the maximum length of the output word string, and output decoded_sentence
.
Output example
array([[ 15., 33., 5., 27., 121., 9., 482., 6., 8.,
4., 3., 0., 0., 0., 0., 0., 0., 0.]])
Since the word ID and word conversion rule are obtained in advance by keras.preprocessing.text.Tokenizer
All you have to do is to apply the conversion law of each component of ndarray
.
To make a python function work on all components of ndarray
, you can write it without loops using np.vectorize
Implementation example is as follows
#decoded_sentense word_Convert index to words and remove start / end tokens
def seq2sentence(seq,lang):
def index2lang(idx, lang):
try:
return lang.index_word[idx]
except KeyError:
return ''
langseq2sentence = np.vectorize(lambda x: index2lang(x,lang),otypes=[str])
sentences = langseq2sentence(seq)
sentences = [' '.join(list(sentence)) for sentence in sentences]
sentences = [sentence.lstrip('<start>').strip(' ').strip('<end>') for sentence in sentences]
return sentences
I put exception handling for the time being. Finally, remove the wasted space and start / end tokens to complete.
The pretreatment part is as follows Neural machine translation with attention https://www.tensorflow.org/tutorials/text/nmt_with_attention
The code base for the learning / inference part is as follows Sequence to sequence example in Keras (character-level). https://keras.io/examples/lstm_seq2seq/
The data used for learning is as follows https://github.com/odashi/small_parallel_enja
Repository containing the code for this article https://github.com/nagiton/simple_NMT
Recommended Posts