This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 08, I will write down my own points.
--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server
Chapter 3 is a basic explanation of the basics of deep learning and its application to natural language processing. In Step08, as an introduction to neural networks, an overview of multi-layer perceptrons and a simple implementation using the deep learning library Keras.
The model that pushes the neurons that make up the brain cells of an organism into a mathematical model is ** Perceptron **, and the model that imitates one neuron is called ** Simple Perceptron **.
--n inputs: x1, x2, ..., xn --n weights: w1, w2, ..., wn --Fixed value bias: b --Output: z --Output function: f ()
z = f(x1w1 + x2w2 + ... + xnwn + b)
The above formula may be coded exactly as it is, but it can be written concisely and the processing speed is fast by using the vector inner product (NumPy) operation. When the simple perceptron is regarded as a discriminator, finding an appropriate value of the weights (w and b) is ** learning **.
import numpy as np
x = np.array([...])
w = np.array([...])
b = ..
z = b + np.dot(x, w)
Similar to nerve cells in the brain, the output of a simple perceptron can be used as the input of another simple perceptron to create a structure in which a large number of perceptrons are connected. ** Multi-layer perceptron (MLP) ** Called.
test_mlp.py
import numpy as np
W_1 = np.array([
[1, 2, 3],
[4, 5, 6],
])
x = np.array([10, 20, 30])
print(np.dot(W_1, x))
print(np.dot(x, W_1))
[140 320]
Traceback (most recent call last):
File "test_mlp.py", line 11, in <module>
print(np.dot(x, W_1))
ValueError: shapes (3,) and (2,3) not aligned: 3 (dim 0) != 2 (dim 0)
In the above, the weights for the two perceptrons in the first layer of MLP are stored in W_1 (2 rows and 3 columns array). By calculating the inner product of this and the input vector of 3 rows and 1 column, the output of 2 rows and 1 column is obtained (2 rows and 3 columns * 3 rows and 1 column = 2 rows and 1 column).
Of course, if you try to calculate the inner product by swapping W_1 and x, an error will occur.
While simple perceptrons can only be applied to linearly separable problems, ** multilayer perceptrons can also be applied to linearly inseparable problems **.
In 08.2, each layer was implemented as a function, but it can be described concisely by using a library.
import keras.layers import Dense
import keras.models import Sequential
model = Sequential()
#1st layer implementation
model.add(Dense(units = 2, activation = 'relu', input_dim = 3))
#Second layer implementation
model.add(Dense(units = 1, activation = 'sigmoid'))
--Dense: Fully connected layer --input_dim: Number of input dimensions. The output dimension of the previous layer becomes the input dimension of the second and subsequent layers, so it can be omitted. --units: number of output dimensions --activation: activation function --Relu: In the past, sigmoid etc. were used, but they are excellent in terms of accuracy and ease of convergence. --Sigmoid: Used in the final layer of MLP for 2-class classifiers --Hyperbolic tangent: Higher performance than sigmoid used in the final layer of MLP of 2-class classifier --Softmax: Used in the final layer of MLP for multiclass classifiers --add: Add a layer to the model instance created by Sequential ()
In discriminator learning, a pair of feature vector X and correct label y is given as teacher data. The parameter (weight) of the classifier is adjusted so that the output when X is input is close to y.
model.compile(loss = 'binary_crossentropy', optimizer = Adam(lr = 0.001))
--loss: Loss function. Evaluate the identification accuracy of a model with a function that expresses the magnitude of the deviation between X and y. --binary_crossentropy: Loss function matching for two-class classifier --Optimizer: Optimization method. Adjusting the parameters is called optimization, and there are various methods. --lr: Learning rate. A parameter that determines how much the weight value is increased or decreased in one update --Large: The change in weight is too large to move back and forth around the optimum value, or diverge without converging at worst. --Small: Learning takes too long
When you instantiate a layer in Keras, the weights are implicitly ** initialized with random numbers **.
If you want to access the weights, use the .get_weights ()
and .set_weights ()
methods.
As I wrote in 10.3, if you want to specify the initialization of the weight, you can specify it with model.add (Dense (.., kernel_initializer =))
.
model.fit(X, y, batch_size = 32, epochs = 100)
--batch_size: Value of how many of the training data to be trained at once for all --epochs: Value of how many times one training data is used for training --fit: Learning process. The error back propagation method is used internally
The obsession with imitation of brain cells has been abandoned, and it has been developed for the purpose of letting computers perform intelligent processing.
Keras is a TensorFlow wrapper library. TensorFlow is a famous deep learning library developed by Google and has many low-level APIs, so it is easy to implement by using Keras, which provides high-level APIs. Since most of the neural network calculations are vector / matrix operations, it is conceivable to use the GPU to learn larger neural networks at high speed.