[PYTHON] [TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras

Introduction

In the previous article, I wrote how to use CTC (Connectionist Temporal Classification) Loss to learn a model (RNN) that takes variable length data for input and output in TensorFlow 2.x. [\ TensorFlow 2 ] Learn RNN with CTC Loss-Qiita

However, there was one left unloaded. ** How to handle CTC Loss well with Keras **. I tried it in the previous article, but the result was unnoticeable, with a lot of dubious hacks and slow processing. This time, I found the solution, so make a note of it.

I think that the method described here can be applied not only to CTC Loss but also when you want to define and learn a special loss function.

Verification environment

Compile () is not the only way to define a loss function

I think the reason for the last defeat was, after all, that I got stuck trying to define CTC Loss with Keras's Model.compile (). However, in fact, there was a way to add a loss function and evaluation scale (correct answer rate, etc.) other than Model.compile ().

Train and evaluate with Keras | TensorFlow Core

The overwhelming majority of losses and metrics can be computed from y_true and y_pred, where y_pred is an output of your model. But not all of them. For instance, a regularization loss may only require the activation of a layer (there are no targets in this case), and this activation may not be a model output.

In such cases, you can call self.add_loss(loss_value) from inside the call method of a custom layer. Here's a simple example that adds activity regularization (note that activity regularization is built-in in all Keras layers -- this layer is just for the sake of providing a concrete example): (Omitted) You can do the same for logging metric values: (Omitted)

** If you define your own layer and use ʻadd_loss () , you can define the loss function regardless of the prototype of `` (y_true, y_pred) `! ** ** No, I have to read the tutorial properly ... orz

The API description for ʻadd_loss () ``, which defines the loss function, and ʻadd_metric () ``, which defines the valuation scale, can be found on the following pages. tf.keras.layers.Layer | TensorFlow Core v2.1.0

As you can see from the sample code in the tutorial, the ** loss function and evaluation scale are Tensor, and they are assembled by the operation of Tensor. ** The x1 included in the sample code is a Tensor that represents the output from the layer and can be used to represent the loss function.

inputs = keras.Input(shape=(784,), name='digits')
x1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x2 = layers.Dense(64, activation='relu', name='dense_2')(x1)
outputs = layers.Dense(10, name='predictions')(x2)
model = keras.Model(inputs=inputs, outputs=outputs)

model.add_loss(tf.reduce_sum(x1) * 0.1)

model.add_metric(keras.backend.std(x1),
                 name='std_of_activation',
                 aggregation='mean')

In this case, the feature quantity series, label series length information, and the label series itself are also necessary information for CTC Loss calculation, so they must be held as Tensor. In other words, these also need to be given as inputs to the model (as x, not y). In other words, ** create a model with multiple inputs **.

Sample code that worked fine

The original source code is GitHub - igormq/ctc_tensorflow_example: CTC + Tensorflow Example for ASR is.

ctc_tensorflow_example_tf2_keras.py


#  Compatibility imports
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import time

import tensorflow as tf
import scipy.io.wavfile as wav
import numpy as np

from six.moves import xrange as range

try:
    from python_speech_features import mfcc
except ImportError:
    print("Failed to import python_speech_features.\n Try pip install python_speech_features.")
    raise ImportError

from utils import maybe_download as maybe_download
from utils import sparse_tuple_from as sparse_tuple_from

# Constants
SPACE_TOKEN = '<space>'
SPACE_INDEX = 0
FIRST_INDEX = ord('a') - 1  # 0 is reserved to space
FEAT_MASK_VALUE = 1e+10

# Some configs
num_features = 13
num_units = 50 # Number of units in the LSTM cell
# Accounting the 0th indice +  space + blank label = 28 characters
num_classes = ord('z') - ord('a') + 1 + 1 + 1

# Hyper-parameters
num_epochs = 400
num_layers = 1
batch_size = 2
initial_learning_rate = 0.005
momentum = 0.9

# Loading the data

audio_filename = maybe_download('LDC93S1.wav', 93638)
target_filename = maybe_download('LDC93S1.txt', 62)

fs, audio = wav.read(audio_filename)

# create a dataset composed of data with variable lengths
inputs = mfcc(audio, samplerate=fs)
inputs = (inputs - np.mean(inputs))/np.std(inputs)
inputs_short = mfcc(audio[fs*8//10:fs*20//10], samplerate=fs)
inputs_short = (inputs_short - np.mean(inputs_short))/np.std(inputs_short)
# Transform in 3D array
train_inputs = tf.ragged.constant([inputs, inputs_short], dtype=np.float32)
train_seq_len = tf.cast(train_inputs.row_lengths(), tf.int32)
train_inputs = train_inputs.to_tensor(default_value=FEAT_MASK_VALUE)

# Reading targets
with open(target_filename, 'r') as f:

    #Only the last line is necessary
    line = f.readlines()[-1]

    # Get only the words between [a-z] and replace period for none
    original = ' '.join(line.strip().lower().split(' ')[2:]).replace('.', '')
    targets = original.replace(' ', '  ')
    targets = targets.split(' ')

# Adding blank label
targets = np.hstack([SPACE_TOKEN if x == '' else list(x) for x in targets])

# Transform char into index
targets = np.asarray([SPACE_INDEX if x == SPACE_TOKEN else ord(x) - FIRST_INDEX
                      for x in targets])
# Creating sparse representation to feed the placeholder
train_targets = tf.ragged.constant([targets, targets[13:32]], dtype=np.int32) 
train_targets_len = tf.cast(train_targets.row_lengths(), tf.int32)
train_targets = train_targets.to_sparse() 

# We don't have a validation dataset :(
val_inputs, val_targets, val_seq_len, val_targets_len = train_inputs, train_targets, \
                                                        train_seq_len, train_targets_len

# THE MAIN CODE!

# add loss and metrics with a custom layer
class CTCLossLayer(tf.keras.layers.Layer):
    def call(self, inputs):
        labels = inputs[0]
        logits = inputs[1]
        label_len = inputs[2]
        logit_len = inputs[3]

        logits_trans = tf.transpose(logits, (1, 0, 2))
        label_len = tf.reshape(label_len, (-1,))
        logit_len = tf.reshape(logit_len, (-1,))
        loss = tf.reduce_mean(tf.nn.ctc_loss(labels, logits_trans, label_len, logit_len, blank_index=-1))
        # define loss here instead of compile()
        self.add_loss(loss)

        # decode
        decoded, _ = tf.nn.ctc_greedy_decoder(logits_trans, logit_len)

        # Inaccuracy: label error rate
        ler = tf.reduce_mean(tf.edit_distance(tf.cast(decoded[0], tf.int32),
                                          labels))
        self.add_metric(ler, name="ler", aggregation="mean")

        return logits  # Pass-through layer.

# Defining the cell
# Can be:
#   tf.nn.rnn_cell.RNNCell
#   tf.nn.rnn_cell.GRUCell 
cells = []
for _ in range(num_layers):
    cell = tf.keras.layers.LSTMCell(num_units)  # Or LSTMCell(num_units)
    cells.append(cell)
stack = tf.keras.layers.StackedRNNCells(cells)

input_feature     = tf.keras.layers.Input((None, num_features), name="input_feature")
input_label       = tf.keras.layers.Input((None,), dtype=tf.int32, sparse=True, name="input_label")
input_feature_len = tf.keras.layers.Input((1,), dtype=tf.int32, name="input_feature_len")
input_label_len   = tf.keras.layers.Input((1,), dtype=tf.int32, name="input_label_len")

layer_masking = tf.keras.layers.Masking(FEAT_MASK_VALUE)(input_feature)
layer_rnn     = tf.keras.layers.RNN(stack, return_sequences=True)(layer_masking)
layer_output  = tf.keras.layers.Dense(
                   num_classes,
                   kernel_initializer=tf.keras.initializers.TruncatedNormal(0.0, 0.1),
                   bias_initializer="zeros",
                   name="logit")(layer_rnn)
layer_loss = CTCLossLayer()([input_label, layer_output, input_label_len, input_feature_len])

# create models for training and prediction (sharing weights)
model_train = tf.keras.models.Model(
            inputs=[input_feature, input_label, input_feature_len, input_label_len],
            outputs=layer_loss)

model_predict = tf.keras.models.Model(inputs=input_feature, outputs=layer_output)

optimizer = tf.keras.optimizers.SGD(initial_learning_rate, momentum)
# adding no loss: we have already defined with a custom layer
model_train.compile(optimizer=optimizer)

# training: y is dummy!
model_train.fit(x=[train_inputs, train_targets, train_seq_len, train_targets_len], y=None,
                validation_data=([val_inputs, val_targets, val_seq_len, val_targets_len], None),
                epochs=num_epochs)

# Decoding
print('Original:')
print(original)
print(original[13:32])
print('Decoded:')
decoded, _ = tf.nn.ctc_greedy_decoder(tf.transpose(model_predict.predict(train_inputs), (1, 0, 2)), train_seq_len)
d = tf.sparse.to_dense(decoded[0], default_value=-1).numpy()
str_decoded = [''.join([chr(x + FIRST_INDEX) for x in np.asarray(row) if x != -1]) for row in d]
for s in str_decoded:
    # Replacing blank label to none
    s = s.replace(chr(ord('z') + 1), '')
    # Replacing space label to space
    s = s.replace(chr(ord('a') - 1), ' ')
    print(s)

The execution result is as follows.

Train on 2 samples, validate on 2 samples
Epoch 1/400
2/2 [==============================] - 2s 991ms/sample - loss: 546.3565 - ler: 1.0668 - val_loss: 464.2611 - val_ler: 0.8801
Epoch 2/400
2/2 [==============================] - 0s 136ms/sample - loss: 464.2611 - ler: 0.8801 - val_loss: 179.9780 - val_ler: 1.0000
(Omitted)
Epoch 400/400
2/2 [==============================] - 0s 135ms/sample - loss: 1.6670 - ler: 0.0000e+00 - val_loss: 1.6565 - val_ler: 0.0000e+00
Original:
she had your dark suit in greasy wash water all year
dark suit in greasy
Decoded:
she had your dark suit in greasy wash water all year
dark suit in greasy

There seems to be no problem with the processing time and the error rate value, and it seems that it finally worked properly ...! (Since the processing time is the value divided by the number of samples, the actual time will be twice the displayed value, but if it is 300 ms or less for 2 samples, it can be said that it is the same as the previous time)

Commentary

Addition of Loss, Metrics

As mentioned at the beginning, you are free to use the model-related Tensor to define the loss function by using Layer.add_loss (). The above code defines a layer called CTCLossLayer, and in call (), TensorFlow 2.x version (previous article See #% E3% 82% B5% E3% 83% B3% E3% 83% 97% E3% 83% AB% E3% 82% B3% E3% 83% BC% E3% 83% 892)) I am writing almost the same process. Finally, the input logits is output as it is.

Here, call () has four arguments except for self. With these four pieces of information, you can do CTC Loss and decode. When building a model, you also need to input four layers as shown below.

layer_loss = CTCLossLayer()(input_label, layer_output, input_label_len, input_feature_len)

Input layer

The information given in the previous argument must be Tensor. layer_output is the same as the normal Keras model, but for `ʻinput_label, input_label_len, input_feature_len``, an input layer is added to support it.

input_feature     = tf.keras.layers.Input((None, num_features), name="input_feature")
input_label       = tf.keras.layers.Input((None,), dtype=tf.int32, sparse=True, name="input_label")
input_feature_len = tf.keras.layers.Input((1,), dtype=tf.int32, name="input_feature_len")
input_label_len   = tf.keras.layers.Input((1,), dtype=tf.int32, name="input_label_len")

As you can see, we are creating a Input`` layer with the appropriate shape and `` dtype``. The `` dtype`` of layers other than features should be int32. Considering the specification of the argument of tf.nn.ctc_loss, I really want to change the shape of ```input_feature_len and ```input_label_lento () , but I get an error later. I couldn't move it well. Therefore, the shape is written as (1,) , and reshapeis performed in CTCLossLayer``.

Another thing I've added is sparse = True when input_label`` was created. If this is specified, the `` Tensor`` corresponding to input_labelbecomes SparseTensor``. tf.keras.Input | TensorFlow Core v2.1.0

This sparse = True is a measure for having to pass the correct label with SparseTensor when calculating the error rate of the decoding result (tf.nn.ctc_loss used in the calculation of CTC Loss). can also receive SparseTensor). The data given by Model.fit () etc. is also created by SparseTensor. tf.nn.ctc_loss | TensorFlow Core v2.1.0 tf.edit_distance | TensorFlow Core v2.1.0

Similarly, when assuming the input of RaggedTensor, there seems to be ragged = True.

How to make a model

The models for learning and prediction (inference) are created separately as shown below.

model_train = tf.keras.models.Model(
            inputs=[input_feature, input_label, input_feature_len, input_label_len],
            outputs=layer_loss)

model_predict = tf.keras.models.Model(inputs=input_feature, outputs=layer_output)

Four inputs were required for training, but only Logits (that is, the output of Dense) are required for prediction (decoding), so features are sufficient as inputs. Therefore, the model for prediction works with only one input. Of course, Loss cannot be calculated, but it is not necessary for decoding purposes only, so specify layer_output before passing through CTCLossLayer as the output.

If you draw it in a diagram, it will be like this. image.png

Since the layers with weights are created to be shared, it is possible to infer with the prediction model as it is after training with the training model.

Compiling the model

Since CTC Loss has been defined in its own layer, there is no need to define a loss function in compile (). In that case, you don't have to simply specify the loss argument.

model_train.compile(optimizer=optimizer)

Execution of learning

The correct label and length information used to calculate the loss function are the information to be sent to the input layer, so they must be specified on the x side of the argument of Model.fit (). There is nothing to specify for y, so write None.

Similarly for validation_data, write None as the second tuple.

model_train.fit(x=[train_inputs, train_targets, train_seq_len, train_targets_len], y=None,
                validation_data=([val_inputs, val_targets, val_seq_len, val_targets_len], None),
                epochs=num_epochs)

To be honest, I don't know if this is the usage that the formula expects, but if you don't specify loss in compile (), y = None works fine ( When loss is specified, a label to be passed to the y_true argument of the loss function is required, so naturally an error will occur unless some data is given to y).

Output of decoding result

As mentioned above, use model_predict for inference. It is OK to give only the feature series to the argument of predict ().

decoded, _ = tf.nn.ctc_greedy_decoder(tf.transpose(model_predict.predict(train_inputs), (1, 0, 2)), train_seq_len)

Other things to worry about

―― Masking and ```input_feature_len`` have similar functions, so it feels somewhat redundant ...

Summary

After reading the tutorial properly, Keras was able to perform learning using CTC Loss. Surprisingly, Keras also has a small turn. I'm sorry.

Recommended Posts

[TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras
How to run TensorFlow 1.0 code in 2.0
How to run CNN in 1 system notation in Tensorflow 2
How to run tests in bulk with Python unittest
[TensorFlow 2] Learn RNN with CTC Loss
For beginners, how to deal with common errors in keras
How to work with BigQuery in Python
How to deal with memory leaks in matplotlib.pyplot
How to reduce GPU memory usage with Keras
[REAPER] How to play with Reascript in Python
I tried to integrate with Keras in TFv1.1
How to run some script regularly in Django
How to deal with run-time errors in subprocess.call
How to use tkinter with python in pyenv
How to run Leap Motion in non-Apple Python
[TF] How to build Tensorflow in Proxy environment
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows
How to convert / restore a string with [] in python
How to run the Ansible module added in Ansible Tower
How to run AutoGluon in Google Colab GPU environment
How to do hash calculation with salt in Python
Explain in detail how to make sounds with python
How to run python in virtual space (for MacOS)
How to deal with pyenv initialization failure in fish 3.1.0
How to do zero-padding in one line with OpenCV
A memorandum on how to use keras.preprocessing.image in Keras
How to load files in Google Drive with Google Colaboratory
How to share folders with Docker and Windows with tensorflow
How to access with cache when reading_json in pandas
How to perform learning in SageMaker without session timeout
How to run setUp only once in python unittest
I tried to implement Grad-CAM with keras and tensorflow
Setting to run application in subdirectory with nginx + uwsgi
How to deal with Executing transaction: failed in Anaconda
[How to!] Learn and play Super Mario with Tensorflow !!
[TF] How to save and load Tensorflow learning parameters
How to embed multiple embeds in one message with Discord.py
How to build Anaconda virtual environment used in Azure Machine Learning and link with Jupyter
Try deep learning with TensorFlow
How to output a document in pdf format with Sphinx
Reinforcement learning in the shortest time with Keras with OpenAI Gym
How to extract any appointment in Google Calendar with Python
How to check ORM behavior in one file with django
How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
How to update with SQLAlchemy?
[Django] How to give input values in advance with ModelForm
How to manipulate the DOM in an iframe with Selenium
For those who want to start machine learning with TensorFlow2
To run gym_torcs with ubutnu16
How to cast with Theano
[AWS] How to deal with "Invalid codepoint" error in CloudSearch
How to run an app built with Python + py2app built with Anaconda
How to run Notepad ++ Python
How to Alter with SQLAlchemy?
How to separate strings with','
[TF] How to load / save Model and Parameter in Keras
How to RDP with Fedora31
How to develop in Python
How to create dataframes and mess with elements in pandas
Try HeloWorld in your own language (with How to & code)
How to Delete with SQLAlchemy?