[PYTHON] Building a seq2seq model using keras' Functional API Inference

What kind of article?

For those who want to try this or that for deep learning modeling but don't know how to implement it Using keras's functional API as a framework that is relatively flexible and reasonably abstracted Try to implement seq2seq, which is difficult with sequential, as simply as possible

table of contents

  1. Overview
  2. Pre-processing
  3. Model Construction & Learning
  4. Reasoning (Imakoko)
  5. Model improvement (not made yet)

Motivation for this article

I was able to learn all the way, but the data flow at the time of learning and the data flow at the time of inference are a little different. How can I make inferences using the parameters obtained from learning? I will answer the question.

What you need for reasoning

First, let's load the model that has been trained and saved once. Since the data flow is different during inference, it is necessary to define a model with a different calculation graph than during training.

Also, in order to realize the process of predicting the next word from the previous word. Use the defined model like a function in a loop to infer sequentially.

Inference implementation

Model loading

Models saved in h5 files etc. can be loaded as follows.

model = keras.models.load_model(filepath)

In addition, pickle seems to be deprecated.

Calculation graph definition

We will build the model shown in the following figure LSTM-Page-1.png

encoder

The same encoder as at the time of learning can be used as it is.

#define encoder
encoder_model = Model(inputs=model.input[0], #encoder_input
                      outputs=model.get_layer('lstm_1').output[1:]) #enconder lstm hidden state

If it can be used as it is, it is possible to extract the output in the middle of Model in this way.

Decoder

The decoder has a little longer code. There are three things to do with the decoder: Embedding (for teacher forcing), LSTM, and Dense of the previous word. Embedding, LSTM, and Dense each have weights during learning, so use those values. Also, the memory of the hidden layer that should be input to LSTM is not always the same as the encoder output, but the memory after inferring the previous word, so it needs to be rewritten from the time of learning. Implementation example is as follows

from keras.layers import Input, LSTM, Dense, Embedding
#define decoder
embedding_dim = 256
units = 1024
vocab_tar_size = model.get_layer('dense_1').weights[1].shape.as_list()[0]

decoder_word_input = Input(shape=(1,),name='decoder_input')
decoder_input_embedding = Embedding(input_dim=vocab_tar_size, 
                                    output_dim=embedding_dim,
                                    weights=model.get_layer('embedding_2').get_weights())(decoder_word_input)


decoder_state_input_h = Input(shape=(units,), name='decoder_input_h')
decoder_state_input_c = Input(shape=(units,), name='decoder_input_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_lstm = LSTM(units, 
                    return_sequences=False, 
                    return_state=True,
                    weights=model.get_layer('lstm_2').get_weights())
decoder_output, state_h, state_c = decoder_lstm(decoder_input_embedding,
                                                initial_state=decoder_states_inputs)

decoder_states = [state_h, state_c]

decoder_dense = Dense(vocab_tar_size, 
                      activation='softmax',
                      weights=model.get_layer('dense_1').get_weights())
decoder_output = decoder_dense(decoder_output)

decoder_model = Model(inputs=[decoder_word_input] + decoder_states_inputs,
                      outputs=[decoder_output] + decoder_states)

What is different from learning

Check the generated model

SVG(model_to_dot(decoder_model).create(prog='dot', format='svg'))

image.png

Definition of the function to translate

Convert input word ID to output word ID

Actually enter the word ID column to get the translated word ID. To do

  1. Encoding to hidden layer memory by encoder
  2. Prediction of the first word using the memory obtained from the encoder and the start token
  3. Prediction of the next word using the memory of the previous word and the previous hidden layer
  4. Output of prediction result

is. The implementation example is as follows. (I wrote it so that it can be batch processed, but it is not essential)

def decode_sequence(input_seq, targ_lang, max_length_targ):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)
    vocab_tar_size = np.array(list(targ_lang.index_word.keys())).max()
    inp_batch_size = len(input_seq)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((inp_batch_size, 1))
    # Populate the first character of target sequence with the start character.
    target_seq[:, 0] = targ_lang.word_index['<start>']
    
    # Sampling loop for a batch of sequences
    decoded_sentence = np.zeros((inp_batch_size, max_length_targ))
    
    for i in range(max_length_targ):
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens,axis=1) #array of size [inp_batch_size, 1]

        decoded_sentence[:,i] = sampled_token_index

        # Update the target sequence (of length 1).
        target_seq = np.zeros((inp_batch_size, 1))
        target_seq[:, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence    

Instances of the Model class have a predict method. If you pass the input to the predict method, the calculation will be performed according to the defined calculation graph and the output will be obtained.

First, we use ʻencoder_model.predict` to encode the input into hidden layer memory.

Assuming that target_seq, whose size is[batch_size, 1], is the previous word, use decoder_model.predict along with the memory of the hidden layer to remember the next word and the hidden layer to be input next. I'm getting.

The resulting words take ʻargmaxin sequence and take Store it indecoded_sentence so that the output will be the size [batch_size, max_length_targ] `.

Execute this loop as many times as the maximum length of the output word string, and output decoded_sentence.

Output example

array([[  15.,   33.,    5.,   27.,  121.,    9.,  482.,    6.,    8.,
           4.,    3.,    0.,    0.,    0.,    0.,    0.,    0.,    0.]])

Convert output word ID to word

Since the word ID and word conversion rule are obtained in advance by keras.preprocessing.text.Tokenizer All you have to do is to apply the conversion law of each component of ndarray. To make a python function work on all components of ndarray, you can write it without loops using np.vectorize

Implementation example is as follows

#decoded_sentense word_Convert index to words and remove start / end tokens
def seq2sentence(seq,lang):
    def index2lang(idx, lang):
        try:
            return lang.index_word[idx]
        except KeyError:
            return ''
    langseq2sentence = np.vectorize(lambda x: index2lang(x,lang),otypes=[str])
    sentences = langseq2sentence(seq)
    sentences = [' '.join(list(sentence)) for sentence in sentences]
    sentences = [sentence.lstrip('<start>').strip(' ').strip('<end>') for sentence in sentences]
    return sentences

I put exception handling for the time being. Finally, remove the wasted space and start / end tokens to complete.

reference

The pretreatment part is as follows Neural machine translation with attention https://www.tensorflow.org/tutorials/text/nmt_with_attention

The code base for the learning / inference part is as follows Sequence to sequence example in Keras (character-level). https://keras.io/examples/lstm_seq2seq/

The data used for learning is as follows https://github.com/odashi/small_parallel_enja

Repository containing the code for this article https://github.com/nagiton/simple_NMT

Recommended Posts

Building a seq2seq model using keras' Functional API Inference
Building a seq2seq model using keras's Functional API Overview
Build a seq2seq model using keras's Functional API Model building & learning
Implementation of VGG16 using Keras created without using a trained model
Create an API that returns data from a model using turicreate
Multi-input / multi-output model with Functional API
Creating a learning model using MNIST
Create a CRUD API using FastAPI
Try implementing XOR with Keras Functional API
Building a virtual environment using homebrew + pyenv-virtualenv
Get a reference model using Django Serializer
Add a layer using the Keras backend
Create a REST API using the model learned in Lobe and TensorFlow Serving.
Python: Time Series Analysis: Building a SARIMA Model
Let's create a REST API using SpringBoot + MongoDB
Creating an interactive application using a topic model
Prepare a pseudo API server using GitHub Actions