[PYTHON] "Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation

While reading "Deep Learning from scratch" (written by Yasuki Saito, published by O'Reilly Japan), I will make a note of the sites I referred to. Part 18 ←

I've been able to judge dogs and cats reasonably well, but I feel that the correct answer rate is still less than 90%, so I think I'll try the Data Augmentation described on page 245 of the book. ..

Data Augmentation What seems to be easy to do by expanding data Inversion rotation Move Is it around?

So, as I saw it on GradCAM, it seems that the cat responds to the curled back posture and the dog responds to the nose.

That means

Isn't it possible to improve the accuracy of cat identification by adding a rotated or inverted image of a cat? If you enlarge the area around the face of the dog image, the dog's identification accuracy will improve. What can be considered.

Therefore, I would like to verify how adding extended data enhances learning.

First, recreate the training data and test data

The data used so far is made fairly well, just because it is only necessary to confirm the operation of the program. There are only 100 test data items.

I want to save about 1000 test data at random. For the rest of the training data, merge the dog and cat images and then rearrange them randomly. The extended data will be separated for each dog and cat processing method, and will be merged with the training data during learning so that the effect can be verified. In the verification, not only the overall correct answer rate, but also the correct answer rate for dogs and the correct answer rate for cats are verified. In addition, check which features the incorrect image responds to with GradCAM.

With such a policy, we recreated the training data.

def rnd_list(motoarray, toridasi):
  #Create an integer list from 0 to the number of data in the original np array
  #After sorting randomly
  #Returns a list of the specified number of integers and a list of the remaining integers
  import random
  import numpy as np 

  kensuu , tate, yoko, channel = motoarray.shape
  moto = list(range(0, kensuu))
  random.shuffle(moto)
  sel = moto[0:toridasi]
  nokori=moto[toridasi:]
  return sel, nokori

def bunkatu(motoarray, toridasi, lblA):
  #np array data
  #In the specified number list and the rest of the list
  #To divide
  sel, nokori = rnd_list(motoarray, toridasi)

  tsl = []
  tsi = []
  trl = []
  tri = []
  for i in sel:
    imgA = dogimg[i]
    tsl.append(lblA)
    tsi.append(imgA)

  for i in nokori:
    imgA = dogimg[i]
    trl.append(lblA)
    tri.append(imgA)
  return tsl, tsi, trl, tri

def rnd_arry(tri, trl):
  #An array of images and an array of labels
  #Randomly sorted
  #List and return
  sel, nokori = rnd_list(tri, 0)
  wtri = []
  wtrl = []
  for i in nokori:
    imgA = tri[i]
    lblA = trl[i]
    wtri.append(imgA)
    wtrl.append(lblA)
  return wtri, wtrl

#Divide for training and testing and integrate dogs and cats
bunkatusuu = 500
ctsl, ctsi, ctrl, ctri = bunkatu(catimg, bunkatusuu, 0)
dtsl, dtsi, dtrl, dtri = bunkatu(dogimg, bunkatusuu, 1)

tri=np.append(ctri, dtri, axis=0)
trl=np.append(ctrl, dtrl, axis=0)
tsi=np.append(ctsi, dtsi, axis=0)
tsl=np.append(ctsl, dtsl, axis=0)

#Sort randomly
wtri, wtrl = rnd_arry(tri, trl)
wtsi, wtsl = rnd_arry(tsi, tsl)

#save

dataset = {}
dataset['test_label']  = np.array(wtsl, dtype=np.uint8)
dataset['test_img']    = np.array(wtsi, dtype=np.uint8)
dataset['train_label'] = np.array(wtrl, dtype=np.uint8)
dataset['train_img']   = np.array(wtri, dtype=np.uint8) 

import pickle

save_file = '/content/drive/My Drive/Colab Notebooks/deep_learning/dataset/catdog.pkl'    
with open(save_file, 'wb') as f:
    pickle.dump(dataset, f, -1)

Training data 23994 (dog 11997, cat 11997), test data 1000 (dog 500, cat 500) Is done.

If you input this and process it with DeepConvNet made in Part 18

Epoch 1/10 188/188 [==============================] - 373s 2s/step - loss: 0.7213 - accuracy: 0.5663 Epoch 2/10 188/188 [==============================] - 373s 2s/step - loss: 0.6378 - accuracy: 0.6290 Epoch 3/10 188/188 [==============================] - 373s 2s/step - loss: 0.5898 - accuracy: 0.6713 Epoch 4/10 188/188 [==============================] - 374s 2s/step - loss: 0.5682 - accuracy: 0.6904 Epoch 5/10 188/188 [==============================] - 373s 2s/step - loss: 0.5269 - accuracy: 0.7128 Epoch 6/10 188/188 [==============================] - 374s 2s/step - loss: 0.4972 - accuracy: 0.7300 Epoch 7/10 188/188 [==============================] - 372s 2s/step - loss: 0.4713 - accuracy: 0.7473 Epoch 8/10 188/188 [==============================] - 374s 2s/step - loss: 0.4446 - accuracy: 0.7617 Epoch 9/10 188/188 [==============================] - 373s 2s/step - loss: 0.4318 - accuracy: 0.7665 Epoch 10/10 188/188 [==============================] - 376s 2s/step - loss: 0.4149 - accuracy: 0.7755 32/32 - 4s - loss: 0.3811 - accuracy: 0.8420

The result is a correct answer rate of 84.2%.

I will verify the contents of the result

predictions = model.predict(x_test)

#List the subscripts for misjudgment

gohantei = []
kensuu, w = predictions.shape
for i in range(kensuu):
  predictions_array = predictions[i]
  predicted_label = np.argmax(predictions_array)
  true_label = t_test[i]
  if predicted_label != true_label:
    gohantei.append(i)

print(len(gohantei))

158

There were 158 false positives.

def plot_image(i, predictions, t_label, img):
    class_names = ['cat', 'dog']
    predictions_array = predictions[i]
    img = img[i].reshape((80, 80, 3))
    true_label = t_label[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                    100*np.max(predictions_array),
                                    class_names[true_label]),
                                    color=color)

num_cols = 10
num_rows = int(len(gohantei) / num_cols ) + 1

plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
  inuneko = t_test[i]
  if inuneko == 0:
    plt.subplot(num_rows, num_cols, j+1)
    plot_image(i, predictions, t_test, x_test)
    j +=1
plt.show()
print("I made a mistake with the cat",j)

plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
  inuneko = t_test[i]
  if inuneko == 1:
    plt.subplot(num_rows, num_cols, j+1)
    plot_image(i, predictions, t_test, x_test)
    j +=1
plt.show()
print("I made a mistake in the dog",j)

The breakdown of misjudgment was 109 cats and 49 dogs. Cats are more than twice as misjudged as dogs.

In the first place, is learning enhanced by adding extended data?

Let's see if cat misjudgment can be reduced by expanding and adding cat data.

#Extract only cat images
catdatalist = []
kensuu = len(dataset['train_img'])
for i in range(kensuu):
  label = dataset['train_label'][i]
  if label == 0:
    catdatalist.append(i)
print(len(catdatalist))

11997

#Create a left-right inverted image dataset of a cat
trl = []
tri = []
lbl = 0
for i in catdatalist:
  img = dataset['train_img'][i]
  img = img[:, ::-1, :]
  trl.append(lbl)
  tri.append(img)

catdataset = {}
catdataset['train_label'] = np.array(trl, dtype=np.uint8)
catdataset['train_img']   = np.array(tri, dtype=np.uint8) 

tri =np.append(dataset['train_img'], catdataset['train_img'], axis=0) 
trl =np.append(dataset['train_label'], catdataset['train_label'], axis=0) 
x_train = tri  / 255.0
t_train = trl

Train with the training data with the inverted cat data added.

model.fit(x_train, t_train,  epochs=10, batch_size=128)

Epoch 1/10 282/282 [==============================] - 571s 2s/step - loss: 0.6604 - accuracy: 0.6783 Epoch 2/10 282/282 [==============================] - 569s 2s/step - loss: 0.5840 - accuracy: 0.7220 Epoch 3/10 282/282 [==============================] - 570s 2s/step - loss: 0.5407 - accuracy: 0.7511 Epoch 4/10 282/282 [==============================] - 572s 2s/step - loss: 0.5076 - accuracy: 0.7689 Epoch 5/10 282/282 [==============================] - 565s 2s/step - loss: 0.4808 - accuracy: 0.7860 Epoch 6/10 282/282 [==============================] - 566s 2s/step - loss: 0.4599 - accuracy: 0.7974 Epoch 7/10 282/282 [==============================] - 563s 2s/step - loss: 0.4337 - accuracy: 0.8115 Epoch 8/10 282/282 [==============================] - 565s 2s/step - loss: 0.4137 - accuracy: 0.8181 Epoch 9/10 282/282 [==============================] - 564s 2s/step - loss: 0.3966 - accuracy: 0.8256 Epoch 10/10 282/282 [==============================] - 565s 2s/step - loss: 0.3759 - accuracy: 0.8331

test_loss, test_acc = model.evaluate(x_test,  t_test, verbose=2)

32/32 - 4s - loss: 0.3959 - accuracy: 0.8220

predictions = model.predict(x_test)
#List the subscripts for misjudgment

gohantei = []
kensuu, w = predictions.shape
for i in range(kensuu):
  predictions_array = predictions[i]
  predicted_label = np.argmax(predictions_array)
  true_label = t_test[i]
  if predicted_label != true_label:
    gohantei.append(i)

print(len(gohantei))

178

def plot_image(i, predictions, t_label, img):
    class_names = ['cat', 'dog']
    predictions_array = predictions[i]
    img = img[i].reshape((80, 80, 3))
    true_label = t_label[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                    100*np.max(predictions_array),
                                    class_names[true_label]),
                                    color=color)

num_cols = 10
num_rows = int(len(gohantei) / num_cols ) + 1

plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
  inuneko = t_test[i]
  if inuneko == 0:
    plt.subplot(num_rows, num_cols, j+1)
    plot_image(i, predictions, t_test, x_test)
    j +=1
plt.show()
print("I made a mistake with the cat",j)

plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
  inuneko = t_test[i]
  if inuneko == 1:
    plt.subplot(num_rows, num_cols, j+1)
    plot_image(i, predictions, t_test, x_test)
    j +=1
plt.show()
print("I made a mistake in the dog",j)

Wrong cat 28 Wrong dog 150

When the inverted data is not entered There were 158 mistakes, 109 mistakes for cats, and 49 mistakes for dogs. Only for cats, the accuracy is greatly improved. However, the accuracy of the dog decreased by that amount, and the accuracy of the dog as a whole also decreased.

Learning seems to be strengthened by adding data, but does it mean that the side effects are also large?

I examined the image of a cat that changed from a mistake to a correct answer and the image of a dog that changed from a correct answer to a mistake.

The cat image that changed from a mistake to the correct answer The red mark is the one that was wrong the second time

The dog image that changed from the correct answer to the wrong answer The yellow mark was the first mistake

The image of a cat sitting like a dog is now correctly judged, but the image of a dog sitting normally is also judged as a "cat"? Also, in the first time, a large copy of the face, which would have been judged as a "dog" only by the nose, is judged as a "cat".

Woom. I don't know.

So what happens if you learn by adding an inverted image of the dog? I tried it.

I tried to learn by adding inverted images of both cats and dogs

In other words, the number of training data will be doubled.

Result is

32/32 - 4s - loss: 0.2186 - accuracy: 0.9090

Corrected answer rate improved to 90%. Of the 91 misjudgments, 48 were mistaken for cats as dogs and 43 were mistaken for dogs as cats.

When the inverted data is not entered There were 158 mistakes, 109 mistakes for cats, and 49 mistakes for dogs. The number of mistaken cats has been halved.

It seems that the image of sitting like a dog and the image of a large face are changed from mistakes to correct answers.

From the above, can we say the following?

-The training data is flipped horizontally, and even if the number of cases is doubled, it can be used for learning and is effective. ・ When classifying into two as in this example, it is better to have the same number of training data for dogs and cats for more unbiased learning.

However, if the number of training data doubles, Google Colab will also run out of RAM and crash. So, it seems difficult to increase the data any more, so we will end the discrimination of dog and cat data at this point.

Part 18 ←