[PYTHON] I tried to implement deep learning that is not deep with only NumPy

In order to understand the essence of deep learning, it is important to implement it from scratch, but MNIST is difficult to implement CNN and it takes time to learn. So, this time, I used the Iris dataset to implement three-layer (one layer when counted in the middle layer) "non-deep" deep learning, that is, just a neural network, very easily. It's batch learning, not mini-batch, but it does include gradient descent and error backpropagation (although not probabilistically). Regarding the theory of deep learning, my favorite book [Deep Learning from scratch](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%] 8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83 % 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0 % E3% 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4- Please read% E5% BA% B7% E6% AF% 85 / dp / 4873117585). It's the best book that is really easy to understand.

Details

I'm not good at drawing diagrams on a computer, so I'm sorry for handwriting. I would like the main source. In addition, the Iris data is randomly sorted from the ones listed in English Wikipedia. DSC_0457.JPG DSC_0461.JPG

Source code

The source can be found on github. Python3.

Only the python code is posted here as well. Download the Iris data from github.

iris.py


# coding: utf-8

import numpy as np

#Hyperparameters
TRAIN_DATA_SIZE = 50  #TRAIN out of 150 data_DATA_SIZE is used as training data. The rest is used as teacher data.
HIDDEN_LAYER_SIZE = 6  #Middle layer(Hidden layer)Size(This time the middle layer is one layer, so scalar)
LEARNING_RATE = 0.1  #Learning rate
ITERS_NUM = 1000  #Number of repetitions

#Read data
#By default'#'Is designed to skip the line
x = np.loadtxt('iris.tsv', delimiter='\t', usecols=(0, 1, 2, 3))
raw_t = np.loadtxt('iris.tsv', dtype=int, delimiter='\t', usecols=(4,))
onehot_t = np.zeros([150, 3])
for i in range(150):
    onehot_t[i][raw_t[i]] = 1

train_x = x[:TRAIN_DATA_SIZE]
train_t = onehot_t[:TRAIN_DATA_SIZE]
test_x = x[TRAIN_DATA_SIZE:]
test_t = onehot_t[TRAIN_DATA_SIZE:]

#Weight / bias initialization
W1 = np.random.randn(4, HIDDEN_LAYER_SIZE) * np.sqrt(2 / 4)  #Initial value of He(Use this for ReLU)
W2 = np.random.randn(HIDDEN_LAYER_SIZE, 3) * np.sqrt(2 / HIDDEN_LAYER_SIZE)
b1 = np.zeros(HIDDEN_LAYER_SIZE)  #Initial value zero * I don't know the reason because I saw Deep Learning made from zero.
b2 = np.zeros(3)

#ReLU function
def relu(x):
    return np.maximum(x, 0)

#Softmax function * I don't know how to implement this function because I saw the net only.
def softmax(x):
    e = np.exp(x - np.max(x))
    if e.ndim == 1:
        return e / np.sum(e, axis=0)
    elif e.ndim == 2:
        return e / np.array([np.sum(e, axis=1)]).T
    else:
        raise ValueError

#Cross entropy error
def cross_entropy_error(y, t):
    if y.shape != t.shape:
        raise ValueError
    if y.ndim == 1:
        return - (t * np.log(y)).sum()
    elif y.ndim == 2:
        return - (t * np.log(y)).sum() / y.shape[0]
    else:
        raise ValueError

#Forward propagation
def forward(x):
    global W1, W2, b1, b2
    return softmax(np.dot(relu(np.dot(x, W1) + b1), W2) + b2)

#Test data results
test_y = forward(test_x)
print((test_y.argmax(axis=1) == test_t.argmax(axis=1)).sum(), '/', 150 - TRAIN_DATA_SIZE)

#Learning loop
for i in range(ITERS_NUM):
    #Forward propagation with data storage
    y1 = np.dot(train_x, W1) + b1
    y2 = relu(y1)
    train_y = softmax(np.dot(y2, W2) + b2)

    #Loss function calculation
    L = cross_entropy_error(train_y, train_t)

    if i % 100 == 0:
        print(L)

    #Gradient calculation
    #Use the formula obtained from the calculation graph
    a1 = (train_y - train_t) / TRAIN_DATA_SIZE
    b2_gradient = a1.sum(axis=0)
    W2_gradient = np.dot(y2.T, a1)
    a2 = np.dot(a1, W2.T)
    a2[y1 <= 0.0] = 0
    b1_gradient = a2.sum(axis=0)
    W1_gradient = np.dot(train_x.T, a2)

    #Parameter update
    W1 = W1 - LEARNING_RATE * W1_gradient
    W2 = W2 - LEARNING_RATE * W2_gradient
    b1 = b1 - LEARNING_RATE * b1_gradient
    b2 = b2 - LEARNING_RATE * b2_gradient

#Result display

#L value of final training data
L = cross_entropy_error(forward(train_x), train_t)
print(L)

#Test data results
test_y = forward(test_x)
print((test_y.argmax(axis=1) == test_t.argmax(axis=1)).sum(), '/', 150 - TRAIN_DATA_SIZE)

Recommended Posts

I tried to implement deep learning that is not deep with only NumPy
I tried to implement Cifar10 with SONY Deep Learning library NNabla [Nippon Hurray]
I tried to implement Deep VQE
I tried to implement ListNet of rank learning with Chainer
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
I tried to divide with a deep learning language model
I tried to implement Autoencoder with TensorFlow
I tried to implement CVAE with PyTorch
I tried to make deep learning scalable with Spark × Keras × Docker
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
I tried to implement reading Dataset with PyTorch
I tried to implement a blockchain that actually works with about 170 lines
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
I tried deep learning
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
I tried to extract a line art from an image with Deep Learning
I tried to make deep learning scalable with Spark × Keras × Docker 2 Multi-host edition
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to create a table only with Django
I tried to implement and learn DCGAN with PyTorch
I tried to implement Minesweeper on terminal with python
I tried to implement an artificial perceptron with python
I tried to implement time series prediction with GBDT
I tried to implement Grad-CAM with keras and tensorflow
[Deep Learning from scratch] I tried to explain Dropout
I tried to implement SSD with PyTorch now (Dataset)
I tried to implement PCANet
I tried to implement StarGAN (1)
"Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
I tried to implement a volume moving average with Quantx
I tried to implement anomaly detection by sparse structure learning
I tried to implement breakout (deception avoidance type) with Quantx
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I tried to implement Harry Potter sort hat with CNN
I captured the Touhou Project with Deep Learning ... I wanted to.
I tried to implement SSD with PyTorch now (model edition)
[1 hour challenge] I tried to make a fortune-telling site that is too suitable with Python
I tried machine learning with liblinear
I tried to implement adversarial validation
I tried to implement hierarchical clustering
I tried deep learning using Theano
I tried to implement Realness GAN
I tried learning LightGBM with Yellowbrick
I tried to make a system that fetches only deleted tweets
[Python] A memo that I tried to get started with asyncio
I tried deep reinforcement learning (Double DQN) for tic-tac-toe with ChainerRL
I tried to find out if ReDoS is possible with Python
I tried to implement sentence classification by Self Attention with PyTorch
When I tried to use pip with python, I was told that XML_SetHashSalt could not be found.
I tried to implement PLSA in Python
I tried to implement permutation in Python
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried to implement PLSA in Python 2
I tried learning with Kaggle's Titanic (kaggle②)
I tried to implement ADALINE in Python
I tried to implement PPO in Python
I tried to solve TSP with QAOA
With PEP8 and PEP257, Python coding that is not embarrassing to show to people!
Mayungo's Python Learning Episode 2: I tried to put out characters with variables