[PYTHON] An amateur stumbled in Deep Learning from scratch Note: Chapter 3

Introduction

Suddenly, I started studying in Chapter 3 of "Deep Learning from scratch-The theory and implementation of deep learning learned with Python". It is a memo of the trip.

The execution environment is macOS Mojave + Anaconda 2019.10. For details, refer to Chapter 1 of this memo.

(To other chapters of this memo: Chapter 1 / Chapter 2 / 3 Chapter / Chapter 4 / Chapter 5 / [Chapter 6](https: / /qiita.com/segavvy/items/ca4ac4c9ee1a126bff41) / Chapter 7 / Chapter 8 / Summary)

Chapter 3 Neural Network

This chapter describes how neural networks work.

3.1 From Perceptron to Neural Network

It explains the difference in how to count layers and the difference between perceptron and neural network. It's a little inconvenient that different people count different layers.

3.2 Activation function

This is an introduction to the types of activation functions. I tried to graph the three types of functions that appear.

# coding: utf-8
import numpy as np
import matplotlib.pylab as plt


def step_function(x):
    """Step function that returns 1 if the input exceeds 0
    
    Args:
        x (numpy.ndarray):input
    
    Returns:
        numpy.ndarray:output
    """
    return np.array(x > 0, dtype=np.int)


def sigmoid(x):
    """Sigmoid function
    
    Args:
        x (numpy.ndarray):input
    
    Returns:
        numpy.ndarray:output
    """
    return 1 / (1 + np.exp(-x))


def relu(x):
    """ReLU function
    
    Args:
        x (numpy.ndarray)):input
    
    Returns:
        numpy.ndarray:output
    """
    return np.maximum(0, x)


#Calculation
x = np.arange(-5.0, 5.0, 0.01)  #step is small so that the step function does not look diagonal
y_step = step_function(x)
y_sigmoid = sigmoid(x)
y_relu = relu(x)

#Graph drawing
plt.plot(x, y_step, label="step")
plt.plot(x, y_sigmoid, linestyle="--", label="sigmoid")
plt.plot(x, y_relu, linestyle=":", label="ReLU")
plt.ylim(-0.1, 5.1)
plt.legend()
plt.show()

スクリーンショット 2019-11-23 22.37.42.png The only thing I didn't understand was that the activation function should not be a linear function. I understood from the explanation of the book that if there is only one neuron in each layer, it can be expressed in one layer even if it is multi-layered. But even if there are multiple neurons in each layer, can it be expressed in one layer? I couldn't understand it well here.

3.3 Calculation of multidimensional array

The explanation is to replace the calculation of multidimensional arrays with the calculation of matrices to improve efficiency. I studied replacement with matrix calculation when I took an online machine learning course [^ 1] about 3 years ago, and used it for 100 knocks of language processing after that [^ 2], so I will review it. I did.

3.4 Implementation of 3-layer neural network

Implement a 3-layer neural network using the matrix calculation in the previous section. I didn't have any particular stumbling blocks because I didn't have the ability to learn.

3.5 Output layer design

Explanation of softmax function. ~~ There was no particular stumbling block here either. ~~ I didn't mean to stumble, but I noticed a mistake. If you do not perform batch processing, there is no problem as implemented in the book, but you need to modify it when implementing "3.6.3 batch processing".

Below is the Softmax code that I tried to support batch processing.

python


def softmax(x):
    """Softmax function
    
    Args:
        x (numpy.ndarray):input
    
    Returns:
        numpy.ndarray:output
    """

    #For batch processing x is(Number of batches, 10)It becomes a two-dimensional array of.
    #In this case, it is necessary to calculate well for each image using broadcast.
    if x.ndim == 2:

        #For each image (axis=1) Calculate the maximum value and reshape so that it can be broadcast
        c = np.max(x, axis=1).reshape(x.shape[0], 1)

        #Calculate the numerator while subtracting the maximum value as an overflow countermeasure
        exp_a = np.exp(x - c)

        #The denominator is also for each image (axis)=Total to 1) and reshape so that it can be broadcast
        sum_exp_a = np.sum(exp_a, axis=1).reshape(x.shape[0], 1)
        
        #Calculated for each image
        y = exp_a / sum_exp_a

    else:

        #If it is not batch processing, implement it according to the book
        c = np.max(x)
        exp_a = np.exp(x - c)  #Overflow measures
        sum_exp_a = np.sum(exp_a)
        y = exp_a / sum_exp_a

    return y

In addition, in the GitHub repository https://github.com/oreilly-japan/deep-learning-from-scratch of this book At one source, it was transposed for broadcasting. Maybe it's speedy, but at first glance I didn't know what I was doing, so I tried code that uses reshape.

3.6 Handwritten digit recognition

It actually implements the inference process of the neural network using the trained parameters. I need a sample_weight.pkl that stores the learned parameters, so this book's GitHub repository [https://github.com/oreilly-japan/deep-learning-from-scratch](https://github. Let's bring the files in the ch3 folder of com / oreilly-japan / deep-learning-from-scratch) to the current directory.

As I proceeded with the implementation according to the book, I ran into an overflow warning.

/Users/segavvy/Documents/deep-learning-from-scratch/ch03/3.6_mnist.py:19: RuntimeWarning: overflow encountered in exp
  return 1 / (1 + np.exp(-x))

For this, refer to the explanation of Meeting Machine Learning with Python >> Logistic Regression >> Sigmoid Function and set the value of x. I tried to fix it so that it would not overflow.

Also, when calculating the final recognition accuracy, in the book, ʻaccuracy_cnt is type-converted to float`, but in python3, division between integers returns a floating point number, so this conversion seems unnecessary.

Also, while implementing it, I was wondering what kind of image I could not infer well, so I tried to display it.

Below is the code I wrote.

# coding: utf-8
import numpy as np
import os
import pickle
import sys
sys.path.append(os.pardir)  #Add parent directory to path
from dataset.mnist import load_mnist
from PIL import Image


def sigmoid(x):
    """Sigmoid function
Since it overflows in the implementation of the book, it is corrected by referring to the following site.
    http://www.kamishima.net/mlmpyja/lr/sigmoid.html

    Args:
        x (numpy.ndarray):input
    
    Returns:
        numpy.ndarray:output
    """
    #Correct x to a range that does not overflow
    sigmoid_range = 34.538776394910684
    x2 = np.maximum(np.minimum(x, sigmoid_range), -sigmoid_range)

    #Sigmoid function
    return 1 / (1 + np.exp(-x2))


def softmax(x):
    """Softmax function
    
    Args:
        x (numpy.ndarray):input
    
    Returns:
        numpy.ndarray:output
    """

    #For batch processing x is(Number of batches, 10)It becomes a two-dimensional array of.
    #In this case, it is necessary to calculate well for each image using broadcast.
    if x.ndim == 2:

        #For each image (axis=1) Calculate the maximum value and reshape so that it can be broadcast
        c = np.max(x, axis=1).reshape(x.shape[0], 1)

        #Calculate the numerator while subtracting the maximum value as an overflow countermeasure
        exp_a = np.exp(x - c)

        #The denominator is also for each image (axis)=Total to 1) and reshape so that it can be broadcast
        sum_exp_a = np.sum(exp_a, axis=1).reshape(x.shape[0], 1)
        
        #Calculated for each image
        y = exp_a / sum_exp_a

    else:

        #If it is not batch processing, implement it according to the book
        c = np.max(x)
        exp_a = np.exp(x - c)  #Overflow measures
        sum_exp_a = np.sum(exp_a)
        y = exp_a / sum_exp_a

    return y


def load_test_data():
    """MNIST test image and test label acquisition
Image value is 0.0〜1.Normalized to 0.

    Returns:
        numpy.ndarray, numpy.ndarray:Test image,Test label
    """
    (x_train, t_train), (x_test, t_test) \
        = load_mnist(flatten=True, normalize=True)
    return x_test, t_test


def load_sapmle_network():
    """Get sample trained weight parameters
    
    Returns:
        dict:Weight and bias parameters
    """
    with open("sample_weight.pkl", "rb") as f:
        network = pickle.load(f)
    return network


def predict(network, x):
    """Inference by neural network
    
    Args:
        network (dict):Weight and bias parameters
        x (numpy.ndarray):Input to neural network
    
    Returns:
        numpy.ndarray:Neural network output
    """
    #Parameter retrieval
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    #Neural network calculation (forward)
    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)

    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y


def show_image(img):
    """Image display
    
    Args:
        image (numpy.ndarray):Image bitmap
    """
    pil_img = Image.fromarray(np.uint8(img))
    pil_img.show()


#Read MNIST test data
x, t = load_test_data()

#Read sample weight parameters
network = load_sapmle_network()

#Inference, recognition accuracy calculation
batch_size = 100  #Batch processing unit
accuracy_cnt = 0  #The number of correct answers
error_image = None  #Unrecognized image
for i in range(0, len(x), batch_size):

    #Batch data preparation
    x_batch = x[i:i + batch_size]

    #inference
    y_batch = predict(network, x_batch)
    p = np.argmax(y_batch, axis=1)

    #Correct answer count
    accuracy_cnt += np.sum(p == t[i:i + batch_size])

    #Error the unrecognized image_Connect to image
    for j in range(0, batch_size):
        if p[j] != t[i + j]:
            if error_image is None:
                error_image = x_batch[j]
            else:
                error_image = np.concatenate((error_image, x_batch[j]), axis=0)

print("Recognition accuracy:" + str(accuracy_cnt / len(x)))

#Display unrecognized images
error_image *= 255  #Image value is 0.0〜1.Since it is normalized to 0, set it back to 0-255 so that it can be displayed.
show_image(error_image.reshape(28 * (len(x) - accuracy_cnt), 28))

And the execution result.

Recognition accuracy:0.9352

スクリーンショット 2019-12-27 21.07.37.png

Since the failed ~~ 793 ~~ 648 images are simply connected vertically, a ridiculously long image is displayed, but there are certainly many characters that are difficult to understand. However, there are some characters that can be recognized.

~~ Also, the book says that the recognition accuracy will be 0.9352, but for some reason it has become 0.9207. Even if I returned the sigmoid function to the state where the warning was issued, it did not change, so there may be some other mistake ... ~~

3.7 Summary

~~ Chapter 3 also had a lot of review for me, so I didn't make a big stumbling block, but I'm worried about the difference in recognition accuracy at the end. ~~ I didn't intend to stumble on Chapter 3, but I noticed the following two points later.

Problem 1 that I noticed later

As @tunnel pointed out, I found out why the recognition accuracy was different from the book! Originally, it was necessary to use the image data value normalized to 0.0 to 1.0, but the one with 0 to 255 was used. Thank you @tunnel! Even so, if the values are different so far, the recognition accuracy is likely to be tattered, but it is interesting that it did not become so bad.

Problem 2 that I noticed later

Even if I learned for some reason in Chapter 4, the loss function did not become small, and when I was investigating the cause, I noticed that the softmax function could not support batch processing. The above code has been fixed. (I was happy if this was explained a little more in Chapter 3 ...)

(To other chapters of this memo: Chapter 1 / Chapter 2 / 3 Chapter / Chapter 4 / Chapter 5 / [Chapter 6](https: / /qiita.com/segavvy/items/ca4ac4c9ee1a126bff41) / Chapter 7 / Chapter 8 / Summary)

[^ 1]: This is a lecture Machine Learning provided by Stanford University in the online course service called Coursera. Volunteers added Japanese subtitles, so even if I wasn't good at English, it was pretty good. The technique of replacing array calculations with matrix calculations is described under the name Vectorization. [^ 2]: I used it when I solved problem 73 of Chapter 8 of 100 Language Processing Knock 2015. Learning notes at that time [100 amateur language processing knocks: 73](https://qiita.com/segavvy/items/5ad0d5742a674bdf56cc#%E3%83%99%E3%82%AF%E3%83%88% Posted as E3% 83% AB% E5% 8C% 96).

Recommended Posts

An amateur stumbled in Deep Learning from scratch Note: Chapter 1
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 7
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 6
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
"Deep Learning from scratch" in Haskell (unfinished)
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Deep Learning from scratch
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Deep Learning from scratch 1-3 chapters
Deep Learning / Deep Learning from Zero 2 Chapter 4 Memo
Deep Learning / Deep Learning from Zero Chapter 3 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Create an environment for "Deep Learning from scratch" with Docker
Deep learning from scratch (cost calculation)
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Deep Learning / Deep Learning from Zero Chapter 5 Memo
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
Deep Learning memos made from scratch
Deep Learning / Deep Learning from Zero 2 Chapter 6 Memo
"Deep Learning from scratch" Self-study memo (No. 14) Run the program in Chapter 4 on Google Colaboratory
"Deep Learning from scratch" Self-study memo (Part 8) I drew the graph in Chapter 6 with matplotlib
Why ModuleNotFoundError: No module named'dataset.mnist' appears in "Deep Learning from scratch".
Write an impression of Deep Learning 3 framework edition made from scratch
Deep learning from scratch (forward propagation edition)
Deep learning / Deep learning from scratch 2-Try moving GRU
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
Python vs Ruby "Deep Learning from scratch" Chapter 2 Logic circuit by Perceptron
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
"Deep Learning from scratch" self-study memo (unreadable glossary)
An amateur tried Deep Learning using Caffe (Introduction)
Good book "Deep Learning from scratch" on GitHub
An amateur tried Deep Learning using Caffe (Practice)
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
An amateur tried Deep Learning using Caffe (Overview)
Python vs Ruby "Deep Learning from scratch" Summary
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Lua version Deep Learning from scratch Part 5.5 [Making pkl files available in Lua Torch]
[For beginners] After all, what is written in Deep Learning made from scratch?
[Deep Learning from scratch] I implemented the Affine layer