This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 09, I will write down my own points.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

Let's implement a multi-class classifier using the multi-layer perceptron discussed in the previous chapter.

--softmax: Activation function for multi-class identification ⇄ sigmoid (for 2-class identification) --categorical_crossentropy: Loss function when multi-class identification ⇄ binary_crossentropy (when 2-class identification)

09.1 Multilayer perceptron as a multi-class classifier

The number of units in the output layer is different between the two-class classifier and the multi-class classifier, and the teacher label is given differently.

--Two-class classifier: A unit with a one-dimensional output layer. Output identification class with 0 or 1 --Expression by class ID - 0, -> Class ID is 0 - 1, -> Class ID is 1 - 2, -> Class ID is 2 --Multi-class classifier: A unit whose output layer is as many as the number of classes. Output the identification class with 1 only for the unit corresponding to the class ID and 0 for the others --One-hot expression - [1, 0, 0], -> Class ID is 0 - [0, 1, 0], -> Class ID is 1 - [0, 0, 1], -> Class ID is 2

Activation function for multi-class identification

Softmax is often used.

--The output fits between 0 and 1 --The sum of all outputs of the applied layer is 1 --** The difference between the large value and the small value of the output value of each unit of the applied layer opens **

By passing it through softmax, the value with the difference in magnitude is settled between 0 and 1, and then it approaches 0 or 1 so that the ratio of magnitude becomes larger.

--Useful for multi-class identification as it is easy to limit a unit with a large value to one --The identification result can be treated as a probability

Two-class classification and multi-class classification

If there are log2N units, N-class classification is theoretically possible by combining those outputs 0 or 1 so that two-class classification can be performed with the output 0 or 1 of one unit. However, the lower unit has to learn the same 0 or 1 in more than one class, which seems to be intuitively unnatural and the learning does not go well.

Loss function during multi-class identification

In contrast to binary_crossentropy for two-class identification, categorical_crossentropy is used for multi-class identification.

Use the list of class IDs as teacher data

When classifying into N class, N neurons must be prepared in the output layer. At this time, the output label must be specified so that N neurons can be given a value of 0 or 1 instead of the class ID itself.

--Convert to one-hot expression with keras.util.to_categorical --Set the loss function to sparse_categorical_crossentropy to support non-one-hot expressions

09.2 Apply to Dialogue Agent

Implementation example

Mounting pattern	point
basic	#Setting ・ Set the number of input dimensions of the model separately ・ Set the number of output dimensions of the model separately ・ When learning ・ Teacher label one-Need to convert to hot representation ・ At the time of identification ・ One-Requires conversion from hot representation to class ID #Run ・ When learning ・ Vectorizer fit_execute transform ・ Fit execution of classifier ・ At the time of identification 　・vectorizerのexecute transform ・ Execute predict of classifier
Keras scikit-with learn API sklearn.pipeline.Embedded in Pipeline	#Setting ・ Set the number of input dimensions of the model separately ・ Set the number of output dimensions of the model separately #Run ・ When learning 　・ Fit execution of vectorizer ・ Pipeline fit execution ・ At the time of identification ・ Execute predict of pipeline

Mounting pattern

point

basic

#Setting
・ Set the number of input dimensions of the model separately
・ Set the number of output dimensions of the model separately
・ When learning
・ Teacher label one-Need to convert to hot representation
・ At the time of identification
・ One-Requires conversion from hot representation to class ID

#Run
・ When learning
・ Vectorizer fit_execute transform
・ Fit execution of classifier
・ At the time of identification
　・vectorizerのexecute transform
・ Execute predict of classifier

Keras scikit-with learn API
sklearn.pipeline.Embedded in Pipeline

#Setting
・ Set the number of input dimensions of the model separately
・ Set the number of output dimensions of the model separately

#Run
・ When learning
　・ Fit execution of vectorizer
・ Pipeline fit execution
・ At the time of identification
・ Execute predict of pipeline

In keras.wrappers.scikit_learn.KerasClassifier, fit () executes the process equivalent to to_categorical, and predict () executes the process equivalent to np.argmax. Also, by using pipeline, fit () and predict () of vectorizer and classifier can be executed together, but note that only fit () of vectorizer is required separately to specify the input dimension when setting the model.

Additions / changes from the previous chapter (Step 06)

Output layer activation function: sigmoid → softmax
Loss function: binary_crossentropy → categorical_crossentropy
Identifyer: RandomForestClassifier → KerasClassifier

    def _build_mlp(self, input_dim, hidden_units, output_dim):
        mlp = Sequential()
        mlp.add(Dense(units=hidden_units,
                      input_dim=input_dim,
                      activation='relu'))
        mlp.add(Dense(units=output_dim, activation='softmax')) #1: Output layer activation function
        mlp.compile(loss='categorical_crossentropy', #2: Loss function
                    optimizer='adam')

        return mlp

    def train(self, texts, labels):
~~

        feature_dim = len(vectorizer.get_feature_names())
        n_labels = max(labels) + 1

        #3: Identifyer
        classifier = KerasClassifier(build_fn=self._build_mlp,
                                     input_dim=feature_dim,
                                     hidden_units=32,
                                     output_dim=n_labels)
~~

`Execution result`


# evaluate_dialogue_agent.Modify py loading module name as needed
from dialogue_agent_sklearn_pipeline import DialogueAgent

$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python evaluate_dialogue_agent.py
0.65957446

Normal implementation (Step 01): 37.2% Pre-processing added (Step02): 43.6% Preprocessing + feature extraction change (Step04): 58.5% Pretreatment + feature extraction change + classifier change (Step06): 61.7% Preprocessing + feature extraction change + classifier change (Step09): 66.0%

Application issues

Added hidden_units and classifier__epochs to the arguments of the train method of the DialogueAgent class.

`dialogue_agent_sklearn_pipeline.py`


    def train(self, texts, labels, hidden_units = 32, classifier__epochs = 100):
~~
        classifier = KerasClassifier(build_fn=self._build_mlp,
                                     input_dim=feature_dim,
                                     hidden_units=hidden_units,
                                     output_dim=n_labels)

~~
        pipeline.fit(texts, labels, classifier__epochs=classifier__epochs)
~~

Specify hidden_units and classifier__epochs when calling the train method of the DialogueAgent class.

`evaluate_dialogue_agent.py`


    HIDDEN_UNITS = 64
    CLASSIFIER_EPOCHS = 50

    # Training
    training_data = pd.read_csv(join(BASE_DIR, './training_data.csv'))

    dialogue_agent = DialogueAgent()
    dialogue_agent.train(training_data['text'], training_data['label'], HIDDEN_UNITS, CLASSIFIER_EPOCHS)

`Execution result`


Epoch 50/50
917/917 [==============================] - 0s 288us/step - loss: 0.0229

###I also took a look at various things###
# pprint.pprint(dialogue_agent.pipeline.steps)
[('vectorizer',
  TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 2), norm='l2', preprocessor=None, smooth_idf=True,
        stop_words=None, strip_accents=None, sublinear_tf=False,
        token_pattern='(?u)\\b\\w\\w+\\b',
        tokenizer=<bound method DialogueAgent._tokenize of <dialogue_agent_sklearn_pipeline.DialogueAgent object at 0x7f7fc81bd128>>,
        use_idf=True, vocabulary=None)),
 ('classifier',
  <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7fa4a6a320>)]

# pprint.pprint(dialogue_agent.pipeline.steps[1][1].get_params())
{'build_fn': <bound method DialogueAgent._build_mlp of <dialogue_agent_sklearn_pipeline.DialogueAgent object at 0x7f7fc81bd128>>,
 'hidden_units': 64,
 'input_dim': 3219,
 'output_dim': 49}

# print([len(v) for v in dialogue_agent.pipeline.steps[1][1].model.layers[0].get_weights()])
[3219, 64]

# print([len(v) for v in dialogue_agent.pipeline.steps[1][1].model.layers[1].get_weights()])
[64, 49]

It can be confirmed that the input layer dimension is 3219, the hidden layer dimension is 64, and the output layer dimension is 49. It was confirmed that it was correct from the format of the weight list of the 0th layer and the 1st layer. (As learning progresses, this list of weights will be updated more and more)

[PYTHON] Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"

Contents

Preparation

Chapter overview

09.1 Multilayer perceptron as a multi-class classifier

Activation function for multi-class identification

Two-class classification and multi-class classification

Loss function during multi-class identification

Use the list of class IDs as teacher data

09.2 Apply to Dialogue Agent

Implementation example

`Execution result`

Application issues

`dialogue_agent_sklearn_pipeline.py`

`evaluate_dialogue_agent.py`

`Execution result`