This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 09, I will write down my own points.
--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server
Let's implement a multi-class classifier using the multi-layer perceptron discussed in the previous chapter.
--softmax: Activation function for multi-class identification ⇄ sigmoid (for 2-class identification) --categorical_crossentropy: Loss function when multi-class identification ⇄ binary_crossentropy (when 2-class identification)
The number of units in the output layer is different between the two-class classifier and the multi-class classifier, and the teacher label is given differently.
--Two-class classifier: A unit with a one-dimensional output layer. Output identification class with 0 or 1
--Expression by class ID
- 0,
-> Class ID is 0
- 1,
-> Class ID is 1
- 2,
-> Class ID is 2
--Multi-class classifier: A unit whose output layer is as many as the number of classes. Output the identification class with 1 only for the unit corresponding to the class ID and 0 for the others
--One-hot expression
- [1, 0, 0],
-> Class ID is 0
- [0, 1, 0],
-> Class ID is 1
- [0, 0, 1],
-> Class ID is 2
Softmax is often used.
--The output fits between 0 and 1 --The sum of all outputs of the applied layer is 1 --** The difference between the large value and the small value of the output value of each unit of the applied layer opens **
By passing it through softmax, the value with the difference in magnitude is settled between 0 and 1, and then it approaches 0 or 1 so that the ratio of magnitude becomes larger.
--Useful for multi-class identification as it is easy to limit a unit with a large value to one --The identification result can be treated as a probability
If there are log2N units, N-class classification is theoretically possible by combining those outputs 0 or 1 so that two-class classification can be performed with the output 0 or 1 of one unit. However, the lower unit has to learn the same 0 or 1 in more than one class, which seems to be intuitively unnatural and the learning does not go well.
In contrast to binary_crossentropy for two-class identification, categorical_crossentropy is used for multi-class identification.
When classifying into N class, N neurons must be prepared in the output layer. At this time, the output label must be specified so that N neurons can be given a value of 0 or 1 instead of the class ID itself.
--Convert to one-hot expression with keras.util.to_categorical --Set the loss function to sparse_categorical_crossentropy to support non-one-hot expressions
Mounting pattern | point |
---|---|
basic | #Setting ・ Set the number of input dimensions of the model separately ・ Set the number of output dimensions of the model separately ・ When learning ・ Teacher label one-Need to convert to hot representation ・ At the time of identification ・ One-Requires conversion from hot representation to class ID #Run ・ When learning ・ Vectorizer fit_execute transform ・ Fit execution of classifier ・ At the time of identification ・vectorizerのexecute transform ・ Execute predict of classifier |
Keras scikit-with learn API sklearn.pipeline.Embedded in Pipeline |
#Setting ・ Set the number of input dimensions of the model separately ・ Set the number of output dimensions of the model separately #Run ・ When learning ・ Fit execution of vectorizer ・ Pipeline fit execution ・ At the time of identification ・ Execute predict of pipeline |
In keras.wrappers.scikit_learn.KerasClassifier, fit () executes the process equivalent to to_categorical, and predict () executes the process equivalent to np.argmax. Also, by using pipeline, fit () and predict () of vectorizer and classifier can be executed together, but note that only fit () of vectorizer is required separately to specify the input dimension when setting the model.
Additions / changes from the previous chapter (Step 06)
def _build_mlp(self, input_dim, hidden_units, output_dim):
mlp = Sequential()
mlp.add(Dense(units=hidden_units,
input_dim=input_dim,
activation='relu'))
mlp.add(Dense(units=output_dim, activation='softmax')) #1: Output layer activation function
mlp.compile(loss='categorical_crossentropy', #2: Loss function
optimizer='adam')
return mlp
def train(self, texts, labels):
~~
feature_dim = len(vectorizer.get_feature_names())
n_labels = max(labels) + 1
#3: Identifyer
classifier = KerasClassifier(build_fn=self._build_mlp,
input_dim=feature_dim,
hidden_units=32,
output_dim=n_labels)
~~
Execution result
# evaluate_dialogue_agent.Modify py loading module name as needed
from dialogue_agent_sklearn_pipeline import DialogueAgent
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python evaluate_dialogue_agent.py
0.65957446
Normal implementation (Step 01): 37.2% Pre-processing added (Step02): 43.6% Preprocessing + feature extraction change (Step04): 58.5% Pretreatment + feature extraction change + classifier change (Step06): 61.7% Preprocessing + feature extraction change + classifier change (Step09): 66.0%
Added hidden_units
and classifier__epochs
to the arguments of the train method of the DialogueAgent class.
dialogue_agent_sklearn_pipeline.py
def train(self, texts, labels, hidden_units = 32, classifier__epochs = 100):
~~
classifier = KerasClassifier(build_fn=self._build_mlp,
input_dim=feature_dim,
hidden_units=hidden_units,
output_dim=n_labels)
~~
pipeline.fit(texts, labels, classifier__epochs=classifier__epochs)
~~
Specify hidden_units
and classifier__epochs
when calling the train method of the DialogueAgent class.
evaluate_dialogue_agent.py
HIDDEN_UNITS = 64
CLASSIFIER_EPOCHS = 50
# Training
training_data = pd.read_csv(join(BASE_DIR, './training_data.csv'))
dialogue_agent = DialogueAgent()
dialogue_agent.train(training_data['text'], training_data['label'], HIDDEN_UNITS, CLASSIFIER_EPOCHS)
Execution result
Epoch 50/50
917/917 [==============================] - 0s 288us/step - loss: 0.0229
###I also took a look at various things###
# pprint.pprint(dialogue_agent.pipeline.steps)
[('vectorizer',
TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
lowercase=True, max_df=1.0, max_features=None, min_df=1,
ngram_range=(1, 2), norm='l2', preprocessor=None, smooth_idf=True,
stop_words=None, strip_accents=None, sublinear_tf=False,
token_pattern='(?u)\\b\\w\\w+\\b',
tokenizer=<bound method DialogueAgent._tokenize of <dialogue_agent_sklearn_pipeline.DialogueAgent object at 0x7f7fc81bd128>>,
use_idf=True, vocabulary=None)),
('classifier',
<keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7fa4a6a320>)]
# pprint.pprint(dialogue_agent.pipeline.steps[1][1].get_params())
{'build_fn': <bound method DialogueAgent._build_mlp of <dialogue_agent_sklearn_pipeline.DialogueAgent object at 0x7f7fc81bd128>>,
'hidden_units': 64,
'input_dim': 3219,
'output_dim': 49}
# print([len(v) for v in dialogue_agent.pipeline.steps[1][1].model.layers[0].get_weights()])
[3219, 64]
# print([len(v) for v in dialogue_agent.pipeline.steps[1][1].model.layers[1].get_weights()])
[64, 49]
It can be confirmed that the input layer dimension is 3219, the hidden layer dimension is 64, and the output layer dimension is 49. It was confirmed that it was correct from the format of the weight list of the 0th layer and the 1st layer. (As learning progresses, this list of weights will be updated more and more)