This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 12, make a note of your own points. I've studied CNN itself, so the content is rough.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

In the previous chapter, by introducing word embeddings, it became possible to handle distributed expressions of words as features. However, in order to tailor it to sentence-level features, it is necessary to take the sum or average of the distributed expressions, which makes the prediction inferior to the BoW system. In this chapter, we will build a convolutional neural network (CNN) with a sequence of distributed expressions of words arranged in a form corresponding to a sentence as input. Note that the CNN used for familiar image analysis is two-dimensional, so the CNN when dealing with natural language processing (text classification, etc.) is one-dimensional.

12.1 ~ 12.4

CNN layer	Contents
Convolutional layer	·input · Distributed representation sequence of words obtained by word embeddings -When stacking CNN layers, the output string of the Pooling layer of the previous layer ・ Align the length of the distributed expression sequence of each word that composes a sentence ・ Ignore the excess length ・ Fill in the missing parts with zero vectors -Kernel for the direction of sentence structure_For each size, multiply it by the weight and add a bias to make it one of the outputs. -Repeat the same operation for each stride in the direction of sentence structure, but use the same weight as the previous layer:weight sharing
Pooling layer	·input -Output string of Convolutional layer -There are Max pooling and Average pooling, but Max pooling, which is a non-linear process, has higher performance. -The same operation can be repeated for each stride in the direction of sentence structure, but there is also a method of processing all at once without setting the stride;global max pooling, global average pooling
fully-connected layer (densely-connected layer;Fully connected layer)	・ I want to input to the multi-layer perceptron for multi-class classification -Since the output of the pooling layer is a two-dimensional array, convert it to a one-dimensional array that can be input to the multi-layer perceptron.

12.5 Implementation of CNN with Keras

The feature is the average of the distributed representation of word embedding (identifier: SVC)

In the previous chapter, the distributed expressions were totaled, so I tried to average them.

import numpy as np
from gensim.models import Word2Vec
from sklearn.svm import SVC
from tokenizer import tokenize
from sklearn.pipeline import Pipeline

class DialogueAgent:
    def __init__(self):
        self.model = Word2Vec.load(
            './latest-ja-word2vec-gensim-model/word2vec.gensim.model')  # <1>

    def train(self, texts, labels):
        pipeline = Pipeline([
            ('classifier', SVC()),
        ])
        pipeline.fit(texts, labels)
        self.pipeline = pipeline

    def predict(self, texts):
        return self.pipeline.predict(texts)

    #The content is almost the same as that of Step 11
    def calc_text_feature(self, text):
~~
#        return np.sum(word_vectors, axis=0)
        return np.average(word_vectors, axis=0)

`evaluate_dialogue_agent.py`


from os.path import dirname, join, normpath

import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score

from <Implemented module name> import DialogueAgent

if __name__ == '__main__':
    BASE_DIR = normpath(dirname(__file__))

    # Training
    training_data = pd.read_csv(join(BASE_DIR, './training_data.csv'))

    dialogue_agent = DialogueAgent()
    X_train = np.array([dialogue_agent.calc_text_feature(text) for text in training_data['text']])
    y_train = np.array(training_data['label'])
    dialogue_agent.train(X_train, y_train)

    # Evaluation
    test_data = pd.read_csv(join(BASE_DIR, './test_data.csv'))
    X_test = np.array([dialogue_agent.calc_text_feature(text) for text in test_data['text']])
    y_test = np.array(test_data['label'])

    y_pred = dialogue_agent.predict(X_test)

    print(accuracy_score(y_test, y_pred))

The feature quantity is the sum / average of the distributed representation of word embedding (identifier: NN).

In the above, the classifier was SVC, so let's change it to NN. Since the distributed representation of each word is averaged, the dimension of the distributed representation texts.shape [1] is set as the input_dim of the Keras Classifier.

~~
    def _build_mlp(self, input_dim, hidden_units, output_dim):
        mlp = Sequential()
        mlp.add(Dense(units=hidden_units,
                      input_dim=input_dim,
                      activation='relu'))
        mlp.add(Dense(units=output_dim, activation='softmax'))
        mlp.compile(loss='categorical_crossentropy',
                    optimizer='adam')

        return mlp

    def train(self, texts, labels, hidden_units = 32, classifier__epochs = 100):
        feature_dim = texts.shape[1]
        print(feature_dim)
        n_labels = max(labels) + 1

        classifier = KerasClassifier(build_fn=self._build_mlp,
                                     input_dim=feature_dim,
                                     hidden_units=hidden_units,
                                     output_dim=n_labels)

        pipeline = Pipeline([
            ('classifier', classifier),
        ])

        pipeline.fit(texts, labels, classifier__epochs=classifier__epochs)

        self.pipeline = pipeline

    def predict(self, texts):
        return self.pipeline.predict(texts)
~~

The distributed expression of word embedding is used as the feature (input layer: Embedding-> Flatten-> Dense).

Since the distributed representation of word embedding is a two-dimensional array, input it to the Dense layer after inserting the Flatten layer.

    #Model building
    model = Sequential()
    model.add(get_keras_embedding(we_model.wv,
                                  input_shape=(MAX_SEQUENCE_LENGTH, ),
                                  trainable=False))

    model.add(Flatten())
    model.add(Dense(units=256, activation='relu'))
    model.add(Dense(units=128, activation='relu'))
    model.add(Dense(units=n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])

The distributed expression of word embedding is used as the feature (input layer: Embedding-> CNN (Dense)).

Input the distributed representation of word embedding to the Convolutional layer of CNN, but set kernel_size to the dimension x_train.shape [1] of the distributed representation and make it the same configuration as the Dense layer.

    #Model building
    model = Sequential()
    model.add(get_keras_embedding(we_model.wv,
                                  input_shape=(MAX_SEQUENCE_LENGTH, ),
                                  trainable=False))  # <6>

    # 1D Convolution
    model.add(Conv1D(filters=256, kernel_size=x_train.shape[1], strides=1, activation='relu'))
    # Global max pooling
    model.add(MaxPooling1D(pool_size=int(model.output.shape[1])))
    model.add(Flatten())
    model.add(Dense(units=128, activation='relu'))
    model.add(Dense(units=n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])

The distributed expression of word embedding is used as the feature (input layer: Embedding-> CNN).

Enter the distributed representation of word embedding into the CNN Convolutional layer. Details are omitted because it is as in the book.

`Embedding layer`


    Embedding(input_dim=word_num + 1,
             output_dim=embedding_dim,
             weights=[weights_with_zero],
             *args, **kwargs)
↓
# *args, **kwargs actually looks like this
    Embedding(input_dim=word_num + 1,
             output_dim=embedding_dim,
             weights=[weights_with_zero],
             input_shape=(MAX_SEQUENCE_LENGTH, ),
             trainable=False)

--Trainable: Weights are not updated during learning (Embedding is performed using already learned weights, so weights cannot be updated by learning) --input_shape: When adding a layer with add in Keras, specify it as the first input layer --input_dim / output_dim: In / out dim of Embedding layer weights. The output dimension of the Embedding layer is equal to output_dim

Execution result

Distributed representation of word embeddings	Identifyer	Execution result
total	SVC	0.40425531914893614
average	SVC	0.425531914893617
total/average	NN	0.5638297872340425 / 0.5531914893617021
Remain in line	Embedding -> Flatten -> Dense	0.5319148936170213
Remain in line	Embedding -> CNN(Dense)	0.5
Remain in line	Embedding -> CNN	0.6808510638297872

--Normal implementation (Step01): 37.2% --Addition of preprocessing (Step02): 43.6% --Pre-processing + feature extraction change (Step04): 58.5% --Pretreatment + feature extraction change + classifier change RandomForest (Step06): 61.7% --Pre-processing + feature extraction change + classifier change NN (Step09): 66.0% --Pretreatment + feature extraction change (Step 11): 40.4% --Pre-processing + feature extraction change + classifier change CNN (Step12): 68.1%

[PYTHON] Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"

Contents