[PYTHON] "Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4

While reading "Deep Learning from scratch 2 Natural language processing" (written by Yasuki Saito, published by O'Reilly Japan), make a note of the sites you referred to. Part 20

Chapters 2 to 4 explain how to convert natural language into vectors

Chapter 2 was easy to understand because it only converted sentences into data and statistically processed it, but what is that? I also felt that. In Chapter 3, neural networks also appear, but I don't understand what I'm doing for what.

so

After all, what I understand at this point is that Chapters 2-4 only explain how to convert natural language sentences into vectors.

p170 2-3.jpg Converting a question written in natural language into a fixed-length vector, as shown in Figure 4-21. If you can, you can use that vector as an input for another machine learning system. By transforming natural language into vectors, in the framework of common machine learning systems It is possible to output (and learn) the desired answer.

It seems that. In other words, the PTB dataset was processed with a neural network called word2vec to automatically create something like a thesaurus, a word vector. Is it possible to create a system that outputs the calculation result as an answer by vectorizing the question in natural language and making it computable by passing it through the word vector?

The image I understood looks like this 2-4.jpg What I did in Chapter 2 is that it is possible to vectorize words that are closely related to "queen" or "sister" near the word "woman". When it comes to Chapter 3, the vector "King"-"Man" is calculated, and when it is added to "Woman", it becomes the vector of "Queen". In other words, the relationship between words becomes a vector, and the definition of a word can be made by vector operation.

If you vectorize natural language, you can input it to a neural network, so you can do various things. Then, there is no easy-to-understand example in Chapter 4 about what can be done. In Chapter 5, the word "language model" comes out, which means that machine translation and speech recognition can also be used.

In other words, the destination is quite long.

SimpleCBOW If you keep the example of the book, the number of words is small and there is no room to play, so I tried to read the original song of The Beatles as it is. In addition, I added a predict method to the class definition of the neural network and gave two words, and tried how to get the answer.

from google.colab import drive
drive.mount('/content/drive')

import sys, os
sys.path.append('/content/drive/My Drive/Colab Notebooks/deep_learning/common2')
sys.path.append('/content/drive/My Drive/Colab Notebooks/deep_learning/dataset2')

import numpy as np


#Modified version of SimpleCBOW
from layers import MatMul, SoftmaxWithLoss

class SimpleCBOW2:
    def __init__(self, vocab_size, hidden_size):
        V, H = vocab_size, hidden_size

        #Weight initialization
        W_in = 0.01 * np.random.randn(V, H).astype('f')
        W_out = 0.01 * np.random.randn(H, V).astype('f')

        #Layer generation
        self.in_layer0 = MatMul(W_in)
        self.in_layer1 = MatMul(W_in)
        self.out_layer = MatMul(W_out)
        self.loss_layer = SoftmaxWithLoss()

        #List all weights and gradients
        layers = [self.in_layer0, self.in_layer1, self.out_layer]
        self.params, self.grads = [], []
        for layer in layers:
            self.params += layer.params
            self.grads += layer.grads

        #Set the distributed representation of words in member variables
        self.word_vecs = W_in

    def forward(self, contexts, target):
        h0 = self.in_layer0.forward(contexts[:, 0])
        h1 = self.in_layer1.forward(contexts[:, 1])
        h = (h0 + h1) * 0.5
        score = self.out_layer.forward(h)
        loss = self.loss_layer.forward(score, target)
        return loss

    def backward(self, dout=1):
        ds = self.loss_layer.backward(dout)
        da = self.out_layer.backward(ds)
        da *= 0.5
        self.in_layer1.backward(da)
        self.in_layer0.backward(da)
        return None
    
    def predict(self, in0, in1):
        h0 = self.in_layer0.forward(in0)
        h1 = self.in_layer1.forward(in1)
        h = (h0 + h1) * 0.5
        score = self.out_layer.forward(h)
        return score

Learn by giving lyrics.

from trainer import Trainer
from optimizer import Adam
from util import preprocess, create_contexts_target, convert_one_hot

window_size = 1
hidden_size = 5
batch_size = 3
max_epoch = 1000

text = 'You say yes and I say no.Omitted below'
corpus, word_to_id, id_to_word = preprocess(text)

vocab_size = len(word_to_id)
contexts, target = create_contexts_target(corpus, window_size)

target = convert_one_hot(target, vocab_size)
contexts = convert_one_hot(contexts, vocab_size)

model = SimpleCBOW2(vocab_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)

trainer.fit(contexts, target, max_epoch, batch_size, eval_interval=None)

word_vecs = model.word_vecs
for word_id, word in id_to_word.items():
    print(word, word_vecs[word_id])

you [ 0.86038524 -0.83497584 0.66613215 -0.8185504 0.68793046] say [-0.96950233 0.9139878 -0.0698488 0.96711737 0.8293194 ] yes [ 0.5127911 -0.52933097 0.5187115 -0.539593 0.17091447] and [-0.72404253 0.69723666 0.9553566 0.70232046 0.6445687 ] i [ 0.8944653 -0.88193524 0.7035641 -0.89571506 0.12104502] no [ 0.5423326 -0.51544315 0.50091434 -0.5078412 0.577903 ] . [-0.70245194 0.69322383 -0.804429 0.70015544 0.5043572 ] stop [ 0.51510733 -0.500861 0.5052154 -0.50537926 0.17358927] go [ 0.5255579 -0.5212051 0.4808163 -0.521005 0.5663737] goodbye [ 0.6193473 -0.5962919 0.6038276 -0.6169504 0.12250113] hello [ 0.6442181 -0.6211034 0.60341436 -0.6134619 0.6989495 ] dont [-0.25871328 0.417597 0.13021737 0.538679 -0.05671578] know [ 0.38923997 -0.44210196 -0.72956645 -0.30691501 -0.7036062 ] why [-0.13761514 0.39734542 -0.67426395 0.57774395 -0.3374435 ] , [-0.5161161 0.48735517 0.6387698 0.5220987 0.5398749 ]

#Give two words and let them answer
def change_one_hot_label(X, vocab_size):
    T = np.zeros((vocab_size))
    T[X] = 1
    return T

def i_dont_know_why_you_say(w0, w1):
  id0 = word_to_id[w0]
  id1 = word_to_id[w1]
  in0 = change_one_hot_label(id0, vocab_size)
  in1 = change_one_hot_label(id1, vocab_size)
  kekka = model.predict(in0, in1)
  k = np.argmax(kekka, axis = None, out = None)
  print(id_to_word[k], k, kekka)
  return

i_dont_know_why_you_say('say', 'goodbye')

you 0 [ 5.92926253 -0.36584044 -2.43450667 3.41120451 -1.54544336 3.94312343 -5.8144207 -2.43553472 3.9297523 -0.63027178 4.39174084 2.22467596 -4.72490933 -8.39840079 5.77989598]

If you give say and goodbye, the answer is you. say and hello to. (Period) say with i and you say with goodbye and hello

I deliberately entered a combination that wasn't in the original context, but if I answered the words before and after say, would it look like that?

Movie review text classification

In the TensorFlow tutorial, there was an example dealing with text, so I tried that. ML basics by keras> basic text classification

The description of the IMDB dataset corresponds to the natural language vectorization described in Chapter 2 of the book. It has a different configuration than the PTB dataset, but the basic idea is the same.

But what I'm doing after this is more like doing it with Fashion Mnist data than creating a language model. Instead of processing words, enter the text of the review as a one-dimensional vector and classify it as positive or negative.

#The input format is the number of vocabularies used in movie reviews (10),000 words)
vocab_size = 10000
embedding_dim=16
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, embedding_dim))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential"


Layer (type)          Output Shape       Param #


embedding (Embedding)     (None, None, 16)     160000
global_average_pooling1d (Gl (None, 16)          0
dense (Dense)          (None, 16)         272
dense_1 (Dense)        (None, 1) 17


Total params: 160,289 Trainable params: 160,289 Non-trainable params: 0

Embedding layer

keras.layers.embeddings.Embedding (input_dim, output_dim, ・ ・ ・)

argument input_dim: Positive integer. Vocabulary number. Maximum index of input data + 1. output_dim: An integer greater than or equal to 0. The number of dimensions of dense embeddings.

Second-order tensor with input shape (batch_size, sequence_length). Third-order tensor with output shape (batch_size, sequence_length, output_dim).

GlobalAveragePooling1D keras.layers.MaxPooling1D(pool_size=2, strides=None, padding='valid')

argument pool_size: Specifies the size of the area to which max pooling is applied. strides: Stride value. Specify as an integer or None. If None, the value of pool_size is applied. padding: Either'valid'or'same'.

3rd floor tensor of input shape (batch_size, steps, features). The third-order tensor of the output shape (batch_size, downsampled_steps, features).

The pooling layer that appeared in the first volume of the book was a two-dimensional Max pooling that was used in image processing. This time, I'm entering 1D text, so 1D Average pooling. It is said that Max pooling is mainly used in images, but is this in natural language?

Learning process

The input in this example has the following review

this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert is an amazing actor and now the same being director father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also to the two little boy's that played the of norman and paul they were just brilliant children are often left out of the list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

It is a word replaced with a number.

[ 1 14 22 16 43 530 973 1622 1385 65 458 4468 66 3941 4 173 36 256 5 25 100 43 838 112 50 670 2 9 35 480 284 5 150 4 172 112 167 2 336 385 39 4 172 4536 1111 17 546 38 13 447 4 192 50 16 6 147 2025 19 14 22 4 1920 4613 469 4 22 71 87 12 16 43 530 38 76 15 13 1247 4 22 17 515 17 12 16 626 18 2 5 62 386 12 8 316 8 106 5 4 2223 5244 16 480 66 3785 33 4 130 12 16 38 619 5 25 124 51 36 135 48 25 1415 33 6 22 12 215 28 77 52 5 14 407 16 82 2 8 4 107 117 5952 15 256 4 2 7 3766 5 723 36 71 43 530 476 26 400 317 46 7 4 2 1029 13 104 88 4 381 15 297 98 32 2071 56 26 141 6 194 7486 18 4 226 22 21 134 476 26 480 5 144 30 5535 18 51 36 28 224 92 25 104 4 226 65 16 38 1334 88 12 16 283 5 16 4472 113 103 32 15 16 5345 19 178 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

A tensor with batch_size of 15000 and sequence_length of 256. Input this to learn.

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

x_val = train_data[:10000]
partial_x_train = train_data[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=40,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)

Epoch 1/40 30/30 [==============================] - 3s 24ms/step - loss: 0.6925 - accuracy: 0.5402 - val_loss: 0.6898 - val_accuracy: 0.6847 ・ ・ Epoch 39/40 30/30 [==============================] - 0s 13ms/step - loss: 0.0964 - accuracy: 0.9756 - val_loss: 0.3096 - val_accuracy: 0.8820 Epoch 40/40 30/30 [==============================] - 0s 13ms/step - loss: 0.0959 - accuracy: 0.9756 - val_loss: 0.3126 - val_accuracy: 0.8826

results = model.evaluate(test_data,  test_labels, verbose=2)
print(results)

782/782 - 1s - loss: 0.3232 - accuracy: 0.8737 [0.3232003450393677, 0.8736799955368042]

I tried to judge the sentence

This is the only explanation on the TensorFlow site, but I tried to judge some example sentences.

def text2list(text):
  text = text.replace('.', '')
  words = text.split(' ')
  return [word_index[w] for w in words]

tlist = text2list("i can't stand it.")
tlist1= text2list("i watched this abortion of a movie in the middle of the night due to insomnia and it was absolute garbage the plot was horrible the acting was horrible")
tlist2= text2list("we see the power of hope and honor and love this films evokes many different emotions but the final feeling is one of admiration of the human spirit by tragedy")

tarray=[]
tarray.append(tlist)
tarray.append(tlist1)
tarray.append(tlist2)
print(len(tarray), type(tarray))

3 <class 'list'>

"i can't stand it." Is from Charlie Brown's habit. Intolerable. The second sentence is an excerpt of a negative review in test_data. The third is an excerpt from a positive review.

text_data = keras.preprocessing.sequence.pad_sequences(tarray,
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)
print(len(text_data[0]),text_data.shape)

256 (3, 256)

predictions = model.predict(text_data)
print(predictions.shape)
print(predictions)

(3, 1) [[0.6715189 ] [0.01489276] [0.957998 ]]

0 is negative, 1 is positive. "i can't stand it." Is 0.67, so it is judged to be a little positive. ?? .. The second is 0.01, which is negative. The third is 0.95, which is positive. Since it contains easy-to-understand words such as garbage horrible and love admiration, it seems that it was judged like this. To put it the other way around, if you review with a slightly twisted euphemism, it may be the opposite of the original intention.

It feels like I still don't understand Embedding. In other words, you don't know. For the time being, let's move on.

Part 20

Click here for the table of contents of the memo Unreadable Glossary

Recommended Posts

"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
"Deep Learning from scratch" Self-study memo (No. 11) CNN
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
"Deep Learning from scratch" self-study memo (No. 18) One! Meow! Grad-CAM!
"Deep Learning from scratch" self-study memo (No. 19-2) Data Augmentation continued
"Deep Learning from scratch" self-study memo (No. 15) TensorFlow beginner tutorial
Deep Learning from scratch 1-3 chapters
"Deep Learning from scratch" self-study memo (unreadable glossary)
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" self-study memo (No. 13) Try using Google Colaboratory
"Deep Learning from scratch" Self-study memo (No. 10-2) Initial value of weight
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Deep Learning from scratch
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
Deep Learning from scratch Chapter 2 Perceptron (reading memo)
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
"Deep Learning from scratch" Self-study memo (No. 14) Run the program in Chapter 4 on Google Colaboratory
Deep Learning / Deep Learning from Zero Chapter 3 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Deep learning from scratch (cost calculation)
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Deep Learning / Deep Learning from Zero Chapter 5 Memo
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
Deep Learning memos made from scratch
Deep Learning / Deep Learning from Zero 2 Chapter 6 Memo
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Deep learning from scratch (forward propagation edition)
Deep learning / Deep learning from scratch 2-Try moving GRU
"Deep Learning from scratch" in Haskell (unfinished)
Why ModuleNotFoundError: No module named'dataset.mnist' appears in "Deep Learning from scratch".
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
"Deep Learning from scratch" Self-study memo (Part 8) I drew the graph in Chapter 6 with matplotlib
[Deep Learning from scratch] About hyperparameter optimization
Realize environment construction for "Deep Learning from scratch" with docker and Vagrant
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Python vs Ruby "Deep Learning from scratch" Summary
Django memo # 1 from scratch
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
[Deep Learning from scratch] I implemented the Affine layer
Python learning memo for machine learning by Chainer Chapters 1 and 2
Application of Deep Learning 2 made from scratch Spam filter
[Deep Learning from scratch] I tried to explain Dropout
Python vs Ruby "Deep Learning from scratch" Chapter 1 Graph of sin and cos functions
Deep learning / LSTM scratch code
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Create an environment for "Deep Learning from scratch" with Docker
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 7