[PYTHON] Introduction to Deep Learning ~ Forward Propagation ~

Target person

This is an article of [Deep Learning Series](#Deep Learning Series). The previous article is here. Here, we will explain the theory of forward propagation in scalar first, and then extend it to a matrix. It will be added or modified to the code introduced in the previous article, so please get the code from the previous article first ~

table of contents

-[Forward propagation with scalar](#Forward propagation with scalar) -[Forward propagation theory in scalar](#Forward propagation theory in scalar) -[Scalar forward propagation implementation](#Scalar forward propagation implementation) -[Forward propagation in matrix](# Forward propagation in matrix) -[Forward propagation theory in matrix](# Forward propagation theory in matrix) -[Forward propagation implementation in matrix](# Forward propagation implementation in matrix) -[Implementation of __init__ method](Implementation of #init method) -[About matrix operation](# About matrix operation) -[Matrix sum](# matrix sum) -[Matrix element product](#matrix element product) -[Matrix product](# matrix product) -Transpose

Forward propagation in scalar

This section describes the theory and implementation of forward propagation in scalars (real numbers). However, it is almost as already mentioned in Basics.

Forward propagation theory in scalar

First of all, the theory. sinple_neuron_model.png Let's start with this neuron model. When this is formulated, $ f (x) = \ sigma (wx + b) $ is obtained as described in [here](https://qiita.com/kuroitu/items/221e8c477ffdd0774b6b#activation function). is. By passing through the activation function $ \ sigma (•) $, it is made non-linear and has the meaning of stacking layers. So what would this operation look like in a computational graph? neuron_object.png

I feel like this. Until now, the input was only $ x $ and other elements were omitted, but in the calculation graph, ** weight $ w $ ** and ** bias (threshold) $ b $ ** are also described properly and activated. It can be rewritten as outputting through a conversion function. These variables $ x, w, b, y $ are the variables that neuron objects should have. Also, regarding activation functions, Python can store functions and classes as objects in variables, so it would be nice if the implementation could be encapsulated by having them in a neuron object. That is all for the theory of forward propagation in SCARA. It's simple and nice ~.

Forward propagation implementation in scalar

Let's implement it. The code to implement is [baselayer.py](https://qiita.com/kuroitu/items/884c62c48c2daa3def08#layer module code preparation).

baselayer.py

baselayer.py


    def forward(self, x):
        """
Implementation of forward propagation
        """
        #Remember your input
        self.x = x.copy()

        #Forward propagation
        y = self.w * x + self.b
        self.y = self.act.forward(y)
        
        return self.y
It holds the input because it is needed for backpropagation. Also, the activation function has not been touched on yet, but I am thinking of implementing it so that it can be used as described above. The other method I have is `backward`, which calculates the derivative. Depending on the thing, it may be necessary to have a ʻupdate` method.

Forward propagation itself is extremely easy, isn't it? As per the formula. That's all for implementation in scalar. Next, consider a vector (rather than a matrix) implementation.

Forward propagation in a matrix

Next, consider forward propagation in a matrix. Matrix is strict if you don't know the concept of matrix multiplication of linear algebra, so if you don't know it, I will explain it briefly in [here](# about matrix operation).

Forward propagation theory in matrix

First, let's think of a layer object that looks like a stack of two neuron objects. layer_object.png I couldn't think of a good expression of the figure, so I will explain it for a moment. First of all, it is OK to understand that the black arrows are neuron objects. It's the arrows of other colors. The light blue arrow </ font> represents the synapse that connects the upper neuron to the lower neuron. Multiply the light blue weight $ w_ {1, 2} $ </ font> as you pass through the middle multiplication node and join the lower addition node. The same goes for the red arrow </ font>. Multiply the red weight $ w_ {2, 1} $ </ font> as you pass through the middle multiplication node and join it to the upper addition node. The input of the addition node has become three, but it can be decomposed into two inputs by using multiple addition nodes, so please consider that it is omitted. tri_add_cal_graph.png Let's follow this with a mathematical formula. In the following, for simplicity, $ \ sigma_i (•) $ is an identity function. First of all, I will write it down.

y_1 = w_{1, 1}x_1 + w_{2, 1}x_2 + b_1 \\
y_2 = w_{1, 2}x_1 + w_{2, 2}x_2 + b_2

I think this itself is obvious if you look at the figure. Let's express this in a matrix representation.

\left(
  \begin{array}{c}
    y_1 \\
    y_2
  \end{array}
\right)
=
\left(
  \begin{array}{cc}
    w_{1, 1} & w_{2, 1} \\
    w_{1, 2} & w_{2, 2}
  \end{array}
\right)
\left(
  \begin{array}{c}
    x_1 \\
    x_2
  \end{array}
\right)
+
\left(
  \begin{array}{c}
    b_1 \\
    b_2
  \end{array}
\right)

I feel like this. If you understand matrix multiplication, you will find that it is an equivalent expression. By the way, regarding $ w_ {i, j} $, pay attention to the subscripts. You usually read subscripts like ** $ i $ row $ j $ column **, right? However, in the above formula, it looks like ** $ j $ row $ i $ column **. Let's transpose it as it is because it is difficult to handle theoretically and implementationally.

\left(
  \begin{array}{c}
    y_1 \\
    y_2
  \end{array}
\right)
=
\left(
  \begin{array}{cc}
    w_{1, 1} & w_{1, 2} \\
    w_{2, 1} & w_{2, 2}
  \end{array}
\right)^{\top}
\left(
  \begin{array}{c}
    x_1 \\
    x_2
  \end{array}
\right)
+
\left(
  \begin{array}{c}
    b_1 \\
    b_2
  \end{array}
\right) \\
\Leftrightarrow
\boldsymbol{Y} = \boldsymbol{W}^{\top}\boldsymbol{X} + \boldsymbol{B}

Transpose is also mentioned in here. For the time being, this completes the mathematical expression of the layer object with 2 inputs and 2 outputs. It's easy. Let's generalize this. That doesn't change. Mathematically

\boldsymbol{Y} = \boldsymbol{W}^{\top}\boldsymbol{X} + \boldsymbol{B}

It remains. Let's take a closer look at this. Considering the $ M $ input $ N $ output layer

\underbrace{\boldsymbol{Y}}_{N \times 1} = \underbrace{\boldsymbol{W}^{\top}}_{N \times M}\underbrace{\boldsymbol{X}}_{M \times 1} + \underbrace{\boldsymbol{B}}_{N \times 1}

It looks like. $ \ Boldsymbol {W} ^ {\ top} $ shows the shape after transposition. Before transposition, it's $ \ underbrace {\ boldsymbol {W}} _ {M \ times N} $. That is all for the theory. Now let's move on to implementation.

Forward propagation implementation in matrix

The implementation location is [baselayer.py](https://qiita.com/kuroitu/items/884c62c48c2daa3def08#layer module code preparation) as in the scalar implementation. Rewrite the implementation in scalar.

baselayer.py

baselayer.py


    def forward(self):
        """
Implementation of forward propagation
        """
        #Remember your input
        self.x = x.copy()

        #Forward propagation
        y = self.w.T @ x + self.b
        self.y = self.act.forward(y)
        
        return self.y
Only one line has changed.

baselayer.py


        y = self.w * x + self.b

But

baselayer.py


        y = self.w.T @ x + self.b

It has become. For numpy arrays, transpose can be done with ndarray.T. The @ operator may be unfamiliar to some, but it is the same as np.dot. ** It is available in Numpy version 1.10 and above, so please be careful if you are using a lower version. ** **

`@` operator description

test_at.py


x = np.array([1, 2])
w = np.array([[1, 0], [0, 1]])
b = np.array([1, 1])

y = w.T @ x + b
print(y)
print(y == np.dot(w.T, x) + b)

#----------
#The output is
# [2 3]
# [ True  True]
#It will be.

In fact, the operation function of the matrix and vector of the @ operator is implicitly used here. If you want to operate as a matrix properly, you may need to do reshape instead of np.matrix instead of np.array.

test_at.py


x = np.matrix([1, 2]).reshape(2, -1)
w = np.matrix([[1, 0], [0, 1]])
b = np.matrix([1, 1]).reshape(2, -1)

y = w.T @ x + b
print(y)
print(y == np.dot(w.T, x) + b)

#----------
#The output is
# [[2]
#  [3]]
# [[ True]
#  [ True]]
#It will be.

It's annoying, isn't it? You can use np.array.

This completes the implementation of forward propagation. I won't change it anymore (maybe).

Implementation of the __init__ method

By the way, there are some members that the layer object should have. Let's implement this. In addition, I will have some members.

Implementation of `__init__`

baselayer.py


    def __init__(self, *, prev=1, n=1, 
                 name="", wb_width=1,
                 act="ReLU",
                 **kwds):
        self.prev = prev  #Number of outputs of the previous layer=Number of inputs to this layer
        self.n = n        #Number of outputs in this layer=Number of inputs to the next layer
        self.name = name  #The name of this layer
        
        #Set weight and bias
        self.w = wb_width*np.random.randn(prev, n)
        self.b = wb_width*np.random.randn(n)
        
        #Activation function(class)Get
        self.act = get_act(act)
As you write each. For weights and biases, the `numpy.random.randn` method sets random numbers according to the standard normal distribution. It is also multiplied by `wb_width` so that the size can be adjusted. The activation function is obtained by importing the `get_act` method from activations.py. I plan to adjust the code of [here](https://qiita.com/kuroitu/items/73cd401afd463a78115a) and bring it.

About matrix operation

Here, we will briefly introduce matrix operations. Please note that I will not explain anything mathematically (I can not do it) just because it is calculated like this.

Matrix sum

First is the matrix sum.

\left(
  \begin{array}{cc}
    a & b \\
    c & d
 \end{array}
\right)
+
\left(
  \begin{array}{cc}
    A & B \\
    C & D
  \end{array}
\right)
=
\left(
  \begin{array}{cc}
    a + A & b + B \\
    c + C & d + D
  \end{array}
\right)

Well, it's a natural result. Add for each element. Not to mention addition ** The shape of the matrix must match exactly. ** **

Matrix multiplication

I will also touch on the element product that is not mentioned at all in this article. The element product is also called the Hadamard product.

\left(
  \begin{array}{cc}
    a & b \\
    c & d
 \end{array}
\right)
\otimes
\left(
  \begin{array}{cc}
    A & B \\
    C & D
  \end{array}
\right)
=
\left(
  \begin{array}{cc}
    aA & bB \\
    cC & dD
  \end{array}
\right)

By the way, the Hadamard product symbol can be written as \ otimes. There are many other symbol-related items on here.

Matrix product

This is a matrix product, not an element product. It is a big difference from the element product that there are restrictions on the shape and that it is not commutative.

\left(
  \begin{array}{cc}
    a & b \\
    c & d
 \end{array}
\right)
\left(
  \begin{array}{cc}
    A & B \\
    C & D
  \end{array}
\right)
=
\left(
  \begin{array}{cc}
    aA + bC & aB + bD \\
    cA + dC & cB + dD
  \end{array}
\right)

As a calculation method, it feels like rolling horizontally $ \ times $. matrix_product_image.gif It is like this. In order to be able to calculate in this way, the number of elements of ** horizontal ** </ font> and the number of elements of ** vertical ** </ font> Must match. In other words, the ** column ** </ font> of the first matrix and the ** row ** </ font> of the second matrix are one. You need to do it. Generalized, the result of the matrix product of the ** $ L \ times M $ matrix and the $ M \ times N $ matrix is $ L \ times N $. ** **

Transpose

Transpose is the operation of exchanging rows and columns of a matrix.

\left(
  \begin{array}{cc}
    a & b & c \\
    d & e & f
 \end{array}
\right)^{\top}
=
\left(
  \begin{array}{cc}
    a & d \\
    b & e \\
    c & f
 \end{array}
\right)

The transpose symbol is written as \ top. This operation only needs this explanation.

in conclusion

This completes the forward propagation implementation. In particular, it may be necessary to perform different processing between the intermediate layer and the output layer, so it is necessary to override it in [middlelayer.py and outputlayer.py](https://qiita.com/kuroitu/items/884c62c48c2daa3def08#layer module code preparation). There is no. Forward propagation is easy and nice.

reference

-How to color with Qiita markdown [140 colors] -Vector notation of mathematical formula description in Qiita -How to write mathematical formulas that often appear in books such as machine learning in Qiita

Deep learning series

-Introduction to Deep Learning ~ Basics ~ -Introduction to Deep Learning ~ Coding Preparation ~ -Thorough understanding of im2col -List of activation functions (2020)

Recommended Posts