[PYTHON] [Machine learning] Text classification using Transformer model (Attention-based classifier)

goal

Modify a part of "Transformer model for language understanding" in Tensorflow tutorial, To be able to do text classification tasks.

Notebook I have uploaded my Notebook on Github. transformer_classify

Commentary

The main differences from the tutorial are listed below.

Data used is livedoor news corpus

――The classification task introduced in this article is assumed to be Japanese document classification when used for business. ――For that reason, we used the livedoor news corpus, which is often used in machine learning.

Use Human for word-separation

――We use Human, which has a good reputation for Japanese word-separation. --Click here for the Dockerfile that automates the download and installation of Juman (https://github.com/raidenn-gh/dockerfile_tf2_py3_jpt_juman)

Decoder removal

--Decoder is a mechanism that receives the output of Encoder and converts it into another language vector. ――This time, we will not use Decoder because it is a classification task, not a conversion to another language vector.

Transformer fix

--Instead of removing the Decoder, overlay the Dense layer on the output obtained by the Encoder and add it as an output layer. --To convert the input text vector into a value that probabilistically expresses which class it is classified into. The Softmax function is used as the activation function.

transformer_classify.ipynb


NUMLABELS = 9

class Transformer(tf.keras.Model):
  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, 
               target_vocab_size, pe_input, pe_target, rate=0.1):
    super(Transformer, self).__init__()

    self.encoder = Encoder(num_layers, d_model, num_heads, dff, 
                           input_vocab_size, pe_input, rate)
    self.dense1 = tf.keras.layers.Dense(d_model, activation='tanh')
    self.dropout1 = tf.keras.layers.Dropout(rate)   
    self.final_layer = tf.keras.layers.Dense(NUMLABELS, activation='softmax')
        
  def call(self, inp, tar, training, enc_padding_mask):

    enc_output = self.encoder(inp, training, enc_padding_mask)  # (batch_size, inp_seq_len, d_model)
    enc_output = self.dense1(enc_output[:,0])
    enc_output = self.dropout1(enc_output, training=training)
    final_output = self.final_layer(enc_output )  # (batch_size, tar_seq_len, target_vocab_size)
    
    return final_output

Loss function

--Since the activation function of the output layer uses the Softmax function, the loss function uses multiclass cross entropy. --Since it is not vectorized in One-hot, SparseCategoricalCrossentropy () is used.

transformer_classify.ipynb


loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

def loss_function(labels, pred):
  loss_ = loss_object(labels, pred)
  return loss_

Addition of val_step

--Val_step using valid data is added after train_step. --Because it is validation, traininng is set to false to skip the dropout layer.

result

I couldn't get very good accuracy.

img.png

Reference URL

tf2_classify BERT with SentencePiece for Japanese text. Make and understand Transformer / Attention Transformer model for language understanding

Recommended Posts

[Machine learning] Text classification using Transformer model (Attention-based classifier)
[Machine learning] LDA topic classification using scikit-learn
Machine learning classification
Face image dataset sorting using machine learning model (# 3)
Machine Learning with Caffe -1-Category images using reference model
Machine learning model considering maintainability
Supervised machine learning (classification / regression)
Try to evaluate the performance of machine learning / classification model
Basic machine learning procedure: ④ Classifier learning + ensemble learning
Classification and regression in machine learning
Inversely analyze a machine learning model
Creating a learning model using MNIST
Application development using Azure Machine Learning
I made a Dir en gray face classifier using TensorFlow --- ⑦ Learning model
I want to create a machine learning service without programming! Text classification
<Course> Machine Learning Chapter 3: Logistic Regression Model
Stock price forecast using machine learning (scikit-learn)
[Machine learning] FX prediction using decision trees
Image recognition model using deep learning in 2016
Machine learning algorithm (implementation of multi-class classification)
[Machine learning] Supervised learning using kernel density estimation
<Course> Machine Learning Chapter 1: Linear Regression Model
Cross Validation improves machine learning model accuracy
Machine learning algorithm classification and implementation summary
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
Stock price forecast using machine learning (regression)
Let's try neural machine translation using Transformer
[Machine learning] Regression analysis using scikit learn
EV3 x Pyrhon Machine Learning Part 3 Classification
Machine learning
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
Classification of guitar images by machine learning Part 1
Data supply tricks using deques in machine learning
Gaussian mixed model EM algorithm [statistical machine learning]
Python & Machine Learning Study Memo ⑤: Classification of irises
[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
[Python3] Let's analyze data using machine learning! (Regression)
Classify machine learning related information by topic model
Try using Jupyter Notebook of Azure Machine Learning
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)