[PYTHON] <Course> Deep Learning: Day1 NN

sutudy-ai


Deep learning

[Deep Learning: Day1 NN] (https://qiita.com/matsukura04583/items/6317c57bc21de646da8e) [Deep Learning: Day2 CNN] (https://qiita.com/matsukura04583/items/29f0dcc3ddeca4bf69a2) [Deep Learning: Day3 RNN] (https://qiita.com/matsukura04583/items/9b77a238da4441e0f973) [Deep Learning: Day4 Reinforcement Learning / TensorFlow] (https://qiita.com/matsukura04583/items/50806b750c8d77f2305d)

Deep Learning: Day1 NN (Lecture Summary)

Section1) Input layer to intermediate layer

NN01.jpg

NN02.jpg

Section2) Activation function

Formula

f(x) = \left\{
\begin{array}{ll}
1 & (x \geq 0) \\
0 & (x \lt 0)
\end{array}
\right.

python


def 
step_function(x):
 if x > 0:
    return 1
 else:
    return 0

Formula

f(u) =  \frac{1}{1+e^{-u}}

python


def sigmoid(x):
  return 1/(1 + np.exp(-x))

It is a function that changes slowly between 0 and 1, and it has become possible to convey the strength of the signal to the state where the step function has only ON / OFF, which has triggered the spread of predictive neural networks. Task At large values, the change in output is small, which can cause a vanishing gradient problem.

f(x) = \left\{
\begin{array}{ll}
x & (x \gt 0) \\
0 & (x \leq 0)
\end{array}
\right.

python


def relu(x):
   return 
np.maximum(0, x)

The most used activation function now Good results have been achieved by contributing to avoiding the vanishing gradient problem and sparsification.

Section3) Output layer

3-1 Error function

nn04.jpg Error calculation Error function = Square error

En(w)=\frac{1}{2}\sum_{j=1}^{I} (y_j-d_j)^2 = \frac{1}{2}||(y-d)||^2

3-2 Output layer activation function

En(w)=-\sum_{i=1}^Id_ilog y_i

python


#Cross entropy
def cross_entropy_error(d, y):
    if y.ndim == 1:
        d = d.reshape(1, d.size)
        y = y.reshape(1, y.size)
        
    #Teacher data is one-hot-In case of vector, convert to index of correct label
    if d.size == y.size:
        d = d.argmax(axis=1)
             
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_size

A one-hot vector is a vector such as (0,1,0,0,0,0) where one component is 1 and the remaining components are all 0. (Reference) Sites examined by What is One-hot vector

Section4) Gradient descent method

(Reference) Gradient descent method commentary site

$ W ^ {(t + 1)} = W ^ {(t)}-\ varepsilon \ nabla Et (\ varepsilon is the learning rate) $ ・ ・ ・ Mini batch gradient descent method
E_t=\frac{1}{N_t}\sum_{n\in D_t}E_n
N_t=|D_t|

The mini-batch gradient descent method is a set of randomly extracted data () mini-batch) average error of samples belonging to $ D_t $

Advantages of mini-batch gradient descent Effective use of computer resources without compromising the advantages of stochastic gradient descent → Thread parallelization using CPU and SIMD parallelization using GPU

Section5) Error back propagation method

Error Gradient Calculation-Error Backpropagation Method [Error back propagation method] The calculated error is differentiated in order from the output layer side and propagated to the layer before the previous layer. A method of analytically calculating the differential value of each parameter with minimal calculation nn07.jpg nn08.jpg By back-calculating the derivative from the calculation result (= error), the derivative can be calculated while avoiding unnecessary recursive calculations. nn09.jpg

python


#Error back propagation
def backward(x, d, z1, y):
    print("\n#####Error back propagation start#####")

    grad = {}

    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']
    #Delta at the output layer
    delta2 = functions.d_sigmoid_with_loss(d, y)
    #Gradient of b2
    grad['b2'] = np.sum(delta2, axis=0)
    #Gradient of W2
    grad['W2'] = np.dot(z1.T, delta2)
    #Delta in the middle layer
    delta1 = np.dot(delta2, W2.T) * functions.d_relu(z1)
    #Gradient of b1
    grad['b1'] = np.sum(delta1, axis=0)
    #Gradient of W1
    grad['W1'] = np.dot(x.T, delta1)
        
    print_vec("Partial differential_dE/du2", delta2)
    print_vec("Partial differential_dE/du2", delta1)

    print_vec("Partial differential_Weight 1", grad["W1"])
    print_vec("Partial differential_Weight 2", grad["W2"])
    print_vec("Partial differential_Bias 1", grad["b1"])
    print_vec("Partial differential_Bias 2", grad["b2"])

    return grad
    

Consideration of confirmation test

[P10] In deep learning, describe what you are trying to do in two lines or less. Also, which of the following values is the ultimate goal of optimization? Choose all. ① Input value [X] ② Output value [Y] ③ Weight [W] ④ Bias [b] ⑤ Total input [u] ⑥ Intermediate layer input [z] ⑦ Learning rate [ρ]

⇒ [Discussion] After all, deep learning aims to determine the parameters that minimize the error. The ultimate goal of optimizing the values is (3) weight [W] and (4) bias [b].

[P12] Put the following network on paper.

⇒ [Discussion] IMG_2280.jpg It's easy to understand if you write it yourself.

[P19] Confirmation test Let's put an example of animal classification in this diagram![P19.gif](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/357717/6a0b680d-9466-598d- 67e9-d9156a754193.gif)

⇒ [Discussion] p19.jpg

[P21] Confirmation test

Write this expression in python

u=w_1x_1+w_2x_2+w_3x_3+w_4x_4+b=Wx+b..(1.2)

⇒ [Discussion]

pyhon


u1=np.dot(x,W1)+b1

[P23] Confirmation test Extract the code that represents the middle layer

⇒ [Discussion]

pyhon


#Total input of hidden layers
u1 = np.dot(x, W1) + b1
#Total output of hidden layer
z1 = functions.relu(u1)

[P26] Confirmation test Explain the difference between linear and non-linear with a diagram.

IMG_0101.jpg

[P34] Confirmation test Fully coupled NN-single layer, multiple nodes nn03.jpg Extract the relevant part from the distributed source code. ⇒ [Discussion] Since the activation function f (u) is a sigmoid function, this is the part.

python


z1 = functions.sigmoid(u)

[P34] Confirmation test Error calculation Error function = Square error

En(w)=\frac{1}{2}\sum_{j=1}^{I} (y_j-d_j)^2 = \frac{1}{2}||(y-d)||^2

・ Describe why you square instead of subtraction ・ Describe what half of the formula below means.

⇒ [Discussion] ・ To express the variance as a plus ・ 1/2 is the average value (Reference) The site here was easy to understand Meaning and calculation method of least squares method --How to find regression line

[P51] Confirmation test (S3_2 output layer_activation function) Softmax function

①f(i,u)=\frac{e^{u_i}②}{\sum_{k=1}^{k}e^{u_k}③}

Show the source code corresponding to the formulas (1) to (3) and explain line by line.

python


def softmax(x):
   if x.ndim == 2:#If it was two-dimensional
     x = x.Tx
     x = x-np.max(x, axis=0)
     y = np.exp(x) /np.sum(np.exp(x), axis=0)
      return y.T
 x = x -np.max(x) #Overflow measures
      return np.exp(x) / np.sum(np.exp(x))

① ・ ・ ・ ・ y (returning transposition from return y.T) ② ・ ・ ・ ・ np.exp (x) part ③ ・ ・ ・ ・ np.sum (np.exp (x), axis = 0) part

(Learning reference) What does NumPy's axis and dimension mean?

[P53] Confirmation test(S3_2 Output layer_Activation function)
Cross entropy
①~Show the source code corresponding to the formula in 2 and explain the process line by line.

```math
En(w)=-\sum_{i=1}^Id_ilog y_i

En(w)・ ・ ・ Part ① -\sum_{i=1}^Id_ilog y_i・ ・ ・ Part ②

⇒ [Discussion] ・ Return-np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_size ・ 1/2 is taking the average value

python


# Cross entropy
def cross_entropy_error(d, y):
    if y.ndim == 1:
        d = d.reshape(1, d.size)
        y = y.reshape(1, y.size)
        
 # If the teacher data is one-hot-vector, convert it to the index of the correct label
    if d.size == y.size:
        d = d.argmax(axis=1)
             
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_size

En(w)・ ・ ・ The part ① is the return value -\sum_{i=1}^Id_ilog y_i・ ・ ・ Part ② -np.sum(np.log(y[np.arange(batch_size), d] + 1e-7)) / batch_The size part.

[P56] Confirmation test(S4 gradient descent method) Find the appropriate source code for the gradient descent function.

W^{(t+1)} =W^{(t)}-\varepsilon\nabla E_n (\varepsilon has a learning rate)・ ・ ・ ・ ① \nabla E=\frac{\partial E}{\partial W}=[\frac{\partial E}{\partial w_1}・ ・ ・\frac{\partial E}{\partial w_M}]・ ・ ・ ・ ②

⇒ [Discussion]

python


# error
loss = functions.cross_entropy_error(d, y)

 grad = backward (x, d, z1, y) # Corresponds to the part ②
for key in ('W1', 'W2', 'b1', 'b2'):
 network [key]-= learning_rate * grad [key] # ①

[P65] Confirmation test(S4 gradient descent method) Summarize what online learning is. ⇒ [Discussion] Online learning means that a learning model can be created using only newly acquired data. It can be turned without utilizing existing data.

[P69] Confirmation test(S4 gradient descent method) Explain the meaning of this formula in a diagram. + W^{(t+1)} =W^{(t)}-\varepsilon\nabla Et (\varepsilon has a learning rate)・ ・ ・ Mini batch gradient descent method
⇒ [Discussion] (〇〇〇) (〇〇〇) (〇〇〇) Set 1 Set 2 Set 3 In this case, add up the errors with any one dataset as a set of mini-batch, 1/3   E_t=\frac{1}{N_t}\sum_{n\in D_t}E_n    N_t=|D_t|
[P78] Confirmation test(S5 error back propagation method) The error back propagation method can avoid unnecessary recursive processing. Extract the source code that holds the calculation results that have already been performed.

python



# Error back propagation
def backward(x, d, z1, y):
 print ("\ n ##### Error back propagation start #####")

    grad = {}

    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']
 #Delta in the output layer ## Here, the derivative of the function that combines the sigmoid function and the cross entropy is calculated and assigned to "delta2".
    delta2 = functions.d_sigmoid_with_loss(d, y)
 Gradient of # b2 ## Using "delta2"
    grad['b2'] = np.sum(delta2, axis=0)
 # W2 gradient ## Using "delta2"
    grad['W2'] = np.dot(z1.T, delta2)
 # Delta in the middle layer ## Using "delta2"
    delta1 = np.dot(delta2, W2.T) * functions.d_relu(z1)
 Gradient of # b1
    grad['b1'] = np.sum(delta1, axis=0)
 # W1 gradient
    grad['W1'] = np.dot(x.T, delta1)
        
 print_vec ("Partial derivative_dE / du2", delta2)
 print_vec ("Partial derivative_dE / du2", delta1)

 print_vec ("Partial derivative_weight 1", grad ["W1"])
 print_vec ("Partial derivative_weight 2", grad ["W2"])
 print_vec ("Partial derivative_bias 1", grad ["b1"])
 print_vec ("Partial derivative_bias 2", grad ["b2"])

    return grad

[P83] Find the source code that corresponds to the two blanks.(S5 error back propagation method) \frac{\partial E}{\partial y} \frac{\partial y}{\partial u}

python


# Delta at the output layer
    delta2 = functions.d_mean_squared_error(d, y)

\frac{\partial E}{\partial y} \frac{\partial y}{\partial u} \frac{\partial u}{\partial w _{ji}^{(2)}}

python


# Delta at the output layer
 # W2 gradient
    grad['W2'] = np.dot(z1.T, delta2)

#Exercise

DN06_Jupyter exercise

python



# Let's try #


# Forward propagation (single layer / single unit)

# weight
W = np.array([[0.1], [0.2]])

## Let's try _ array initialization
W = np.zeros(2)
 W = np.ones (2) # Select here
W = np.random.rand(2)
W = np.random.randint(5, size=(2))

 print_vec ("weight", W)


# bias
b = 0.5

## Let's try _ Numerical initialization
 b = np.random.rand () # Random number from 0 to 1 # Select this
# b = np.random.rand () * 10 -5 # -5 ~ 5 random numbers

 print_vec ("bias", b)

# Input value
x = np.array([2, 3])
 print_vec ("input", x)


# Total input
u = np.dot(x, W) + b
 print_vec ("total input", u)

# Intermediate layer output
z = functions.relu(u)
 print_vec ("intermediate layer output", z)

weight [1. 1.]

bias 0.15691869859919338

input [2 3]

Total input 5.156918698599194

Intermediate layer output 5.156918698599194

python



# Let's try #

# Forward propagation (single layer / multiple units)

# weight
W = np.array([
  [0.1, 0.2, 0.3],
  [0.2, 0.3, 0.4], 
  [0.3, 0.4, 0.5],
  [0.4, 0.5, 0.6]
  ])

## Let's try _ array initialization
W = np.zeros((4,3))
 W = np.ones ((4,3)) # Select here
W = np.random.rand(4,3)
W = np.random.randint(5, size=(4,3))

 print_vec ("weight", W)

# bias
b = np.array([0.1, 0.2, 0.3])
 print_vec ("bias", b)

# Input value
x = np.array([1.0, 5.0, 2.0, -1.0])
 print_vec ("input", x)


# Total input
u = np.dot(x, W) + b
 print_vec ("total input", u)

# Intermediate layer output
z = functions.sigmoid(u)
 print_vec ("intermediate layer output", z)

weight [[1. 1. 1.] [1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]

bias [0.1 0.2 0.3]

input [ 1. 5. 2. -1.]

Total input [7.1 7.2 7.3]

Intermediate layer output [0.99917558 0.99925397 0.99932492]

python



# Let's try #

# Multi-class classification
# 2-3-4 network

# !! Let's try _ Let's change the node configuration to 3-5-4

# Set weights and biases
# Create a work
def init_network():
 print ("##### Network initialization #####")

 #Let's try
 #_ Display the shape of each parameter
 #_ Network initial value random generation

    network = {}
    network['W1'] = np.array([
        [0.1, 0.4, 0.7, 0.1, 0.3],
        [0.2, 0.5, 0.8, 0.1, 0.4],
        [0.3, 0.6, 0.9, 0.2, 0.5]
    ])
    network['W2'] = np.array([
        [0.1, 0.6, 0.1, 0.6],
        [0.2, 0.7, 0.2, 0.7],
        [0.3, 0.8, 0.3, 0.8],
        [0.4, 0.9, 0.4, 0.9],
        [0.5, 0.1, 0.5, 0.1]
    ])
    network['b1'] = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
    network['b2'] = np.array([0.1, 0.2, 0.3, 0.4])
    
 print_vec ("weight 1", network ['W1'])
 print_vec ("weight 2", network ['W2'])
 print_vec ("bias 1", network ['b1'])
 print_vec ("bias 2", network ['b2'])

    return network

# Create a process
# x: Input value
def forward(network, x):
    
 print ("##### Start propagation #####")
    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']
    
 # 1 layer total input
    u1 = np.dot(x, W1) + b1

 # 1 layer total output
    z1 = functions.relu(u1)

 # 2 layers total input
    u2 = np.dot(z1, W2) + b2
    
 # Output value
    y = functions.softmax(u2)
    
 print_vec ("total input 1", u1)
 print_vec ("intermediate layer output 1", z1)
 print_vec ("total input 2", u2)
 print_vec ("output 1", y)
 print ("total output:" + str (np.sum (y)))
        
    return y, z1

## Preliminary data
# Input value
x = np.array([1., 2., 3.])

# Target output
d = np.array([0, 0, 0, 1])

# Network initialization
network =  init_network()

# output
y, z1 = forward(network, x)

# error
loss = functions.cross_entropy_error(d, y)

## display
 print ("\ n ##### Result display #####")
 print_vec ("output", y)
 print_vec ("training data", d)
 print_vec ("error", loss)

#####Network initialization##### Weight 1 [[0.1 0.4 0.7 0.1 0.3] [0.2 0.5 0.8 0.1 0.4] [0.3 0.6 0.9 0.2 0.5]]

Weight 2 [[0.1 0.6 0.1 0.6] [0.2 0.7 0.2 0.7] [0.3 0.8 0.3 0.8] [0.4 0.9 0.4 0.9] [0.5 0.1 0.5 0.1]]

Bias 1 [0.1 0.2 0.3 0.4 0.5]

Bias 2 [0.1 0.2 0.3 0.4]

#####Start forward propagation##### Total input 1 [1.5 3.4 5.3 1.3 3.1]

Intermediate layer output 1 [1.5 3.4 5.3 1.3 3.1]

Total input 2 [4.59 9.2 4.79 9.4 ]

Output 1 [0.00443583 0.44573018 0.00541793 0.54441607]

Output total: 1.0

#####Result display##### output [0.00443583 0.44573018 0.00541793 0.54441607]

Training data [0 0 0 1]

error 0.6080413107681358

python



# Let's try #


# Regression
# 2-3-2 Network

# !! Let's try _ Let's change the node configuration to 3-5-4

# Set weights and biases
# Create a work
def init_network():
 print ("##### Network initialization #####")

    network = {}
    network['W1'] = np.array([
        [0.1, 0.4, 0.7, 0.1, 0.3],
        [0.2, 0.5, 0.8, 0.1, 0.4],
        [0.3, 0.6, 0.9, 0.2, 0.5] 
    ])
    network['W2'] = np.array([
       [0.1, 0.6, 0.1, 0.6],
        [0.2, 0.7, 0.2, 0.7],
        [0.3, 0.8, 0.3, 0.8],
        [0.4, 0.9, 0.4, 0.9],
        [0.5, 0.1, 0.5, 0.1]
    ])
    network['b1'] = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
    network['b2'] = np.array([0.1, 0.2, 0.3, 0.4])
    
 print_vec ("weight 1", network ['W1'])
 print_vec ("weight 2", network ['W2'])
 print_vec ("bias 1", network ['b1'])
 print_vec ("bias 2", network ['b2'])

    return network

# Create a process
def forward(network, x):
 print ("##### Start propagation #####")
    
    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']
 # Total input of hidden layer
    u1 = np.dot(x, W1) + b1
 # Total output of hidden layer
    z1 = functions.relu(u1)
 # Total input of output layer
    u2 = np.dot(z1, W2) + b2
 # Total output of output layer
    y = u2
    
 print_vec ("total input 1", u1)
 print_vec ("intermediate layer output 1", z1)
 print_vec ("total input 2", u2)
 print_vec ("output 1", y)
 print ("total output:" + str (np.sum (z1)))
    
    return y, z1

# Input value
x = np.array([1., 2., 3.])
network =  init_network()
y, z1 = forward(network, x)
# Target output
d = np.array([2., 3.,4.,5.])
# error
loss = functions.mean_squared_error(d, y)
## display
 print ("\ n ##### Result display #####")
 print_vec ("intermediate layer output", z1)
 print_vec ("output", y)
 print_vec ("training data", d)
 print_vec ("error", loss)

#####Network initialization##### Weight 1 [[0.1 0.4 0.7 0.1 0.3] [0.2 0.5 0.8 0.1 0.4] [0.3 0.6 0.9 0.2 0.5]]

Weight 2 [[0.1 0.6 0.1 0.6] [0.2 0.7 0.2 0.7] [0.3 0.8 0.3 0.8] [0.4 0.9 0.4 0.9] [0.5 0.1 0.5 0.1]]

Bias 1 [0.1 0.2 0.3 0.4 0.5]

Bias 2 [0.1 0.2 0.3 0.4]

#####Start forward propagation##### Total input 1 [1.5 3.4 5.3 1.3 3.1]

Intermediate layer output 1 [1.5 3.4 5.3 1.3 3.1]

Total input 2 [4.59 9.2 4.79 9.4 ]

Output 1 [4.59 9.2 4.79 9.4 ]

Output total: 14.6

#####Result display##### Intermediate layer output [1.5 3.4 5.3 1.3 3.1]

output [4.59 9.2 4.79 9.4 ]

Training data [2. 3. 4. 5.]

error 8.141525

python



# Let's try #


# Binary classification
# 2-3-1 Network

# !! Let's try _ Let's change the node configuration to 5-10-1

# Set weights and biases
# Create a work
def init_network():
 print ("##### Network initialization #####")

    network = {}
    network['W1'] = np.array([
        [0.1, 0.3, 0.5,0.1, 0.3, 0.5,0.1, 0.3, 0.5, 0.6],
        [0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7],
        [0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7],
        [0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7],
        [0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.2, 0.4, 0.6,0.7]
    ])
    network['W2'] = np.array([
       [0.1],
       [0.1],
       [0.1],
       [0.1],
       [0.1],
       [0.1],
       [0.1],
       [0.1],
       [0.1],
       [0.1]
    ])
    network['b1'] = np.array([0.1, 0.3, 0.5,0.1, 0.3, 0.5,0.1, 0.3, 0.5, 0.6])
    network['b2'] = np.array([0.1])
    return network


# Create a process
def forward(network, x):
 print ("##### Start propagation #####")
    
    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']    

 # Total input of hidden layer
    u1 = np.dot(x, W1) + b1
 # Total output of hidden layer
    z1 = functions.relu(u1)
 # Total input of output layer
    u2 = np.dot(z1, W2) + b2
 # Total output of output layer
    y = functions.sigmoid(u2)
            
 print_vec ("total input 1", u1)
 print_vec ("intermediate layer output 1", z1)
 print_vec ("total input 2", u2)
 print_vec ("output 1", y)
 print ("total output:" + str (np.sum (z1)))

    return y, z1


# Input value
x = np.array([1., 2., 3., 4., 5.])
# Target output
d = np.array([1])
network =  init_network()
y, z1 = forward(network, x)
# error
loss = functions.cross_entropy_error(d, y)

## display
 print ("\ n ##### Result display #####")
 print_vec ("intermediate layer output", z1)
 print_vec ("output", y)
 print_vec ("training data", d)
 print_vec ("error", loss)

#####Network initialization##### #####Start forward propagation##### Total input 1 [ 3. 6.2 9.4 3. 6.2 9.4 3. 6.2 9.4 11. ]

Intermediate layer output 1 [ 3. 6.2 9.4 3. 6.2 9.4 3. 6.2 9.4 11. ]

Total input 2 [6.78]

Output 1 [0.99886501]

Output total: 66.8

#####Result display##### Intermediate layer output [ 3. 6.2 9.4 3. 6.2 9.4 3. 6.2 9.4 11. ]

output [0.99886501]

Training data [1]

error 0.0011355297129812408 ⇒ [Discussion] It was found that the output result changes greatly depending on the size of the intermediate layer. How do you decide the number of units? I found it difficult to decide on the middle class. The number of units in the input layer needs to match the dimensions of the data so you don't get lost. And you don't have to think about the number of units in the output layer as you prepare as many as the number of classified classes. I would like to find out if the approximation works if the number of middle layers is increased enormously. I also found it difficult to set weights and biases.

DN15_Jupyter Exercise 2

python



# Let's try #


# Sample function
# AI that predicts the value of y

def f(x):
    y = 3 * x[0] + 2 * x[1]
    return y

# Initial setting
def init_network():
 # print ("##### Network initialization #####")
    network = {}
    nodesNum = 10
    network['W1'] = np.random.randn(2, nodesNum)
    network['W2'] = np.random.randn(nodesNum)
    network['b1'] = np.random.randn(nodesNum)
    network['b2'] = np.random.randn()

 # print_vec ("weight 1", network ['W1'])
 # print_vec ("weight 2", network ['W2'])
 # print_vec ("bias 1", network ['b1'])
 # print_vec ("bias 2", network ['b2'])

    return network

# Forward propagation
def forward(network, x):
 # print ("##### Sequential propagation start #####")
    
    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']
    u1 = np.dot(x, W1) + b1
    #z1 = functions.relu(u1)
    
 ## Let's try
 z1 = functions.sigmoid (u1) # Select and try sigmoid
    
    u2 = np.dot(z1, W2) + b2
    y = u2

 # print_vec ("total input 1", u1)
 # print_vec ("Middle layer output 1", z1)
 # print_vec ("total input 2", u2)
 # print_vec ("output 1", y)
 # print ("total output:" + str (np.sum (y)))
    
    return z1, y

# Error back propagation
def backward(x, d, z1, y):
 # print ("\ n ##### Error back propagation start #####")

    grad = {}
    
    W1, W2 = network['W1'], network['W2']
    b1, b2 = network['b1'], network['b2']

 #Delta in the output layer
    delta2 = functions.d_mean_squared_error(d, y)
 Gradient of # b2
    grad['b2'] = np.sum(delta2, axis=0)
 # W2 gradient
    grad['W2'] = np.dot(z1.T, delta2)
 # Delta in the middle layer
    #delta1 = np.dot(delta2, W2.T) * functions.d_relu(z1)

 ## Let's try #Select sigmoid and try
    delta1 = np.dot(delta2, W2.T) * functions.d_sigmoid(z1)

    delta1 = delta1[np.newaxis, :]
 Gradient of # b1
    grad['b1'] = np.sum(delta1, axis=0)
    x = x[np.newaxis, :]
 # W1 gradient
    grad['W1'] = np.dot(x.T, delta1)
    
 # print_vec ("Partial derivative_weight 1", grad ["W1"])
 # print_vec ("Partial derivative_weight 2", grad ["W2"])
 # print_vec ("Partial derivative_bias 1", grad ["b1"])
 # print_vec ("Partial derivative_bias 2", grad ["b2"])

    return grad

# Create sample data
data_sets_size = 100000
data_sets = [0 for i in range(data_sets_size)]

for i in range(data_sets_size):
    data_sets[i] = {}
 # Set a random value
    # data_sets[i]['x'] = np.random.rand(2)
    
 ## Let's try _ Input value setting # Select this to try
 data_sets [i] ['x'] = np.random.rand (2) * 10 -5 # Random number from -5 to 5
    
 # Set target output
    data_sets[i]['d'] = f(data_sets[i]['x'])
    
losses = []
# Learning rate
learning_rate = 0.07

# Number of extracts
epoch = 1000

# Parameter initialization
network = init_network()
# Random extraction of data
random_datasets = np.random.choice(data_sets, epoch)

# Repeated gradient descent
for dataset in random_datasets:
    x, d = dataset['x'], dataset['d']
    z1, y = forward(network, x)
    grad = backward(x, d, z1, y)
 #Apply gradient to parameter
    for key in ('W1', 'W2', 'b1', 'b2'):
        network[key]  -= learning_rate * grad[key]

 #Error
    loss = functions.mean_squared_error(d, y)
    losses.append(loss)

 print ("##### Result display #####")
lists = range(epoch)


plt.plot(lists, losses, '.')
# Graph display
plt.show()
Screenshot 2019-12-31 6.16.29.png ⇒ [Discussion] By changing from the Relu function to the sigmoid function, the variance closer to 0 in the graph has expanded. Also,-By selecting a random number from 5 to 5, #Completion assignment

DN16_Completion assignment (production assignment)

+(Problem) Create Deep Learninng using IRIS data

###design Create a model that predicts by training IRIS data in a 2: 1 ratio between training data and test data.

  • Input layer: 4 dimensions *Hidden layer: 6 dimensions *Output layer: 3D (due to 3D classification problem) *Bonding between layers: Dense *Input layer → hidden layer activation function: relu *Hidden layer → Output layer activation function: softmax *Optimization: Gradient descent *Loss function: cross entropy
Screenshot 2020-01-01 16.46.51.png Screenshot 2020-01-01 16.47.10.png

python


import numpy as np

# Hyperparameters
 INPUT_SIZE = 4 # number of input nodes
 HIDDEN_SIZE = 6 # Number of neurons in the middle layer (hidden layer)
 OUTPUT_SIZE = 3 # Number of neurons in the output layer
 TRAIN_DATA_SIZE = 50 # Use TRAIN_DATA_SIZE as training data out of 150 data. The rest is used as teacher data.
 LEARNING_RATE = 0.1 # Learning rate
 EPOCH = 1000 # Number of repeated learnings (number of epochs)

# Read data

# Get the Iris dataset here. Since the data is sorted by type with headings, CSV data is prepared so that 150 data are mixed in 3 types and 10 cases each.
 https://gist.github.com/netj/8836201

x = np.loadtxt('/content/drive/My Drive/DNN_code/data/iris.csv', delimiter=',',skiprows=1, usecols=(0, 1, 2, 3))
raw_t = np.loadtxt('/content/drive/My Drive/DNN_code/data/iris.csv',  delimiter=',',skiprows=1,dtype="unicode", usecols=(4,))

t = np.zeros([150])

for i in range(0,150):
  vari = raw_t[i]
  #print(vari,raw_t[i],i)
  if ("Setosa" in vari):
      t[i] = int(0)
  elif ("Versicolor" in vari):
      t[i] = int(1)
  elif ("Virginica" in vari):
      t[i] = int(2)
  else:
 print ("error", i)

a = [3, 0, 8, 1, 9]
a = t.tolist()
a_int = [int(n) for  n in a]
print(a_int)

a_one_hot = np.identity(10)[a_int]
a_one_hot = np.identity(len(np.unique(a)))[a_int]

print(a_one_hot)

train_x = x[:TRAIN_DATA_SIZE]
train_t = a_one_hot[:TRAIN_DATA_SIZE]
test_x = x[TRAIN_DATA_SIZE:]
test_t = a_one_hot[TRAIN_DATA_SIZE:]

print("train=",TRAIN_DATA_SIZE,train_x,train_t)
print("test=",test_x,test_t)

# Weight / bias initialization #He initial value (for using ReLU)
W1 = np.random.randn(INPUT_SIZE, HIDDEN_SIZE) / np.sqrt(INPUT_SIZE) * np.sqrt(2)  
W2 = np.random.randn(HIDDEN_SIZE, OUTPUT_SIZE)/ np.sqrt(HIDDEN_SIZE) * np.sqrt(2)
# Adjust from initial value zero
b1 = np.zeros(HIDDEN_SIZE) 
b2 = np.zeros(OUTPUT_SIZE)

# ReLU function
def relu(x):
    return np.maximum(x, 0)

# Softmax function
def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T

 x = x --np.max (x) # Overflow measures
    return np.exp(x) / np.sum(np.exp(x))

# Cross entropy error
def cross_entropy_error(y, t):
    if y.shape != t.shape:
        raise ValueError
    if y.ndim == 1:
        return - (t * np.log(y)).sum()
    elif y.ndim == 2:
        return - (t * np.log(y)).sum() / y.shape[0]
    else:
        raise ValueError

# Forward propagation
def forward(x):
    global W1, W2, b1, b2
    return softmax(np.dot(relu(np.dot(x, W1) + b1), W2) + b2)

# Test data results
test_y = forward(test_x)
 print ("Before learning =", (test_y.argmax (axis = 1) == test_t.argmax (axis = 1)). Sum (),'/', 150 --TRAIN_DATA_SIZE)

# Learning loop
for i in range(EPOCH):
 # Forward propagation with data storage
    y1 = np.dot(train_x, W1) + b1
    y2 = relu(y1)
    train_y = softmax(np.dot(y2, W2) + b2)

 # Loss function calculation
    L = cross_entropy_error(train_y, train_t)

 if i% 100 == 0: remainder of # 100
        print("L=",L)

 # Gradient calculation
    a1 = (train_y - train_t) / TRAIN_DATA_SIZE
    b2_gradient = a1.sum(axis=0)
    W2_gradient = np.dot(y2.T, a1)
    a2 = np.dot(a1, W2.T)
    a2[y1 <= 0.0] = 0
    b1_gradient = a2.sum(axis=0)
    W1_gradient = np.dot(train_x.T, a2)

 #Parameter update
    W1 = W1 - LEARNING_RATE * W1_gradient
    W2 = W2 - LEARNING_RATE * W2_gradient
    b1 = b1 - LEARNING_RATE * b1_gradient
    b2 = b2 - LEARNING_RATE * b2_gradient

# Result display

# L value of final training data
L = cross_entropy_error(forward(train_x), train_t)
 print ("L value of final training data =", L)

# Test data results
test_y = forward(test_x)
 print ("After learning =", (test_y.argmax (axis = 1) == test_t.argmax (axis = 1)). sum (),'/', 150 --TRAIN_DATA_SIZE)

(result) Before learning = 42/ 100 L= 4.550956552060894 L= 0.3239415165787326 L= 0.2170679838829666 L= 0.04933110713361697 L= 0.0273865499319481 L= 0.018217122389043848 L= 0.013351028977015358 L= 0.010399165844496665 L= 0.008444934117102367 L= 0.007068429052588092 L value of final training data= 0.0060528995955394386 After learning = 89/ 100

⇒ [Discussion]

  • Q1.What is the purpose of the assignment? What kind of device can be done
    The purpose of the task is to learn the basic mechanism of deep learning throughout. It can be devised by adjusting the number of intermediate layers and hyperparameters.
  • Q2.What is the meaning of solving a task with a classification task?
    For classification tasks, you can learn to generate training and test data.
  • Q3.What is IRIS data? Two lines
    Data prepared for learning machine learning and deep learning.
    Four features are provided based on the classification of three types of irises.

Recommended Posts

<Course> Deep Learning: Day1 NN
<Course> Deep Learning: Day2 CNN
<Course> Deep Learning Day4 Reinforcement Learning / Tensor Flow
Rabbit Challenge Deep Learning 1Day
Subjects> Deep Learning: Day3 RNN
Rabbit Challenge Deep Learning 2Day
Deep Learning
Thoroughly study Deep Learning [DW Day 0]
Machine learning beginners take Coursera's Deep learning course
[Rabbit Challenge (E qualification)] Deep learning (day2)
[Rabbit Challenge (E qualification)] Deep learning (day3)
Deep Learning Memorandum
Start Deep learning
Python learning day 4
Python Deep Learning
Deep learning × Python
[Rabbit Challenge (E qualification)] Deep learning (day4)
First Deep Learning ~ Struggle ~
Python: Deep Learning Practices
Deep learning / activation functions
Deep Learning from scratch
Learning record 4 (8th day)
Learning record 9 (13th day)
Learning record 3 (7th day)
Deep learning 1 Practice of deep learning
Deep learning / cross entropy
Learning record 5 (9th day)
Learning record 6 (10th day)
First Deep Learning ~ Preparation ~
Programming learning record day 2
First Deep Learning ~ Solution ~
Learning record 8 (12th day)
[AI] Deep Metric Learning
Learning record 1 (4th day)
Learning record 7 (11th day)
I tried deep learning
Machine learning course memo
Python: Deep Learning Tuning
Learning record 2 (6th day)
Deep learning large-scale technology
Learning record 16 (20th day)
Learning record 22 (26th day)
Deep learning / softmax function
Deep learning course that can be crushed on site
Learning record No. 21 (25th day)
Effective Python Learning Memorandum Day 15 [15/100]
Try deep learning with TensorFlow
Learning record 13 (17th day) Kaggle3
Deep Learning Gaiden ~ GPU Programming ~
Effective Python Learning Memorandum Day 6 [6/100]
Learning record No. 10 (14th day)
Effective Python Learning Memorandum Day 12 [12/100]
Learning record No. 17 (21st day)
Effective Python Learning Memorandum Day 9 [9/100]
Learning record 12 (16th day) Kaggle2
Deep learning image recognition 1 theory
Effective Python Learning Memorandum Day 8 [8/100]
Learning record No. 18 (22nd day)
Deep running 2 Tuning of deep learning
Deep learning / LSTM scratch code
Deep Kernel Learning with Pyro