Python beginners touch Pytorch (3)

This is a continuation of Python beginners touch Pytorch (2). I personally write an article three times. Finally, I will explain the neural network that inspired me to touch Pytorch.

1. Neural network First, I will explain about neural networks. A neural network is a mathematical representation of a neural model of the brain using nodes and links. ![ニューラルネットワーク.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/722110/607493df-3240-7e1f-a17a-8e4069043127.png)

If you show it in a diagram, you can see why it is called a network. By the way, the node is in the shape of a circle, and the link is the one that connects the nodes represented by the arrows.

"Deep learning," which has become a hot topic in recent years, is a stack of "intermediate layers (two layers in between)" </ strong> in this figure. This model is also referred to as hierarchical </ strong> </ font>. There is also a model called recursive (RNN) </ strong> </ font>. Please see the image below. シンプルrnn.png This is good at learning that keeps time series. However, the amount of calculation is large, which makes the calculation difficult.

The following is a comparison of both networks. ネットワーク比較.png You should select the network according to the application you want to solve. By the way, hierarchical type </ strong> is often used for image recognition, and recursive type </ strong> is often used for natural language processing (character recognition, voice recognition).

2. Consider a hierarchical neural network This time, we will create a hierarchical neural network that is the basis of the foundation. Therefore, we will learn a little more about hierarchical neural networks. First, let's understand the calculations performed by neural networks. I will explain only by simple multiplication and addition, so thank you.

2-1. Weights and forward propagation

Weights represent the importance of an input. If the weight is high, the part with the high weight is of high importance in discriminating the event that is a neural network.

Let's dig a little deeper into the weights using a concrete example. For example, when buying a bag, I think there is something important for each person to decide (determine) whether to buy or not. Roughly speaking, "durability", "capacity", "design", "name recognition", etc. Since I attach great importance to design, if each weight is expressed numerically, "element" = weight: "durability" = 5, "capacity = 5", "design = 8", " The name recognition = 5 "</ font>. Since design is of the utmost importance, it is natural that the weight of design is high.

Let's show it in a diagram. ニューラルネットワーク1.png

The diagram is easy to understand and simple. In this figure, you can see that "Input 1" is an important element in this layer. Also, as the next input layer increases, the number of weights will increase accordingly. ニューラルネットワーク2.png Looking at the figure, you can see that different numbers are passed to the two next layers.

By the way, this kind of sequential propagation of input is called forward propagation </ font>.

In the neural network, each weight is corrected by "training" to find the appropriate weight. So this time it's okay if you understand what the weights are. Next, I will explain the incomprehensible function called "activation function".

2-2. Activation function Now let's talk about the activation function. The activation function is very important to add more flexibility to the neural network.

As a famous activation function

  1. Sigmoid function
  2. ReLU function 3.tanh (hyperbolic tangent)
  3. Heaviside function

there is. Please check the function by yourself. It may seem difficult to use only mathematical formulas, but what is important in the activation function is not the difficulty of mathematical formulas. 1. Non-linear, 2. Easy differentiation </ font>. Non-linear means that it is not straight. Let's take a look at the graph of the ReLU function. It is listed on wikipedia. [Activation function (wikipedia)](https://ja.wikipedia.org/wiki/%E6%B4%BB%E6%80%A7%E5%8C%96%E9%96%A2%E6%95% B0 # ReLU% EF% BC% 88% E3% 83% A9% E3% 83% B3% E3% 83% 97% E9% 96% A2% E6% 95% B0% EF% BC% 89)

What do you think. Certainly it wasn't linear. Next is the ease of differentiation. For the basics of differentiation, see Previous article and sites and books that explain more professionally. Easy differentiation makes network training easier and weights easier to find. (If you build a neural network using the framework, the program will do it automatically, so don't worry.)

Next, we will learn when to use the activation function. The timing to use the activation function is immediately before </ font>, which propagates forward and is transmitted to the next layer. 活性化関数.png

What does it mean to make neural networks more flexible with activation functions? Let's check this in the figure as well. First of all, when constructing a neural network linearly and discriminating without inserting an activation function 線形.png

Not all objects are neatly divided even if they are discriminated. Just as we humans mistake things that are easy to understand, things that have similar artificial intelligence can be mistakenly identified. Let's add an activation function and convert it non-linearly. 非線形.png The graph works a little too well, but you can change the width of the discrimination like this. There is no guarantee that the judgment will be correct, but there is no doubt that the correct answer rate of the judgment will change at least rather than staying linear.

3. Build a neural network with Pytorch Let's use what we have learned so far to build a neural network
import torch
import torch.nn as nn
import torch.nn.functional as F

First, import the required modules. Next, we will build a network. By the way, the network to be built this time is like this. qiitanet1.png

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()

        self.fc1 = nn.Linear(2,4)
        self.fc2 = nn.Linear(4,4)
        self.fc3 = nn.Linear(4,1)

    def forword(self,x):
        y = F.relu(self.fc1(x))
        y = F.relu(self.fc2(y))
        y = self.fc3(y)
        return y

In Pytorch, we will build a neural network so that we define the network and then call it with a function. It is a so-called dynamic graph (define by run). Compared to TensorFlow (a framework developed by Google), I feel that it is easier to understand because it retains the Python character.

I will explain the code. Class Net is created using the module nn.Module </ strong> that Pytorch has. We will use this nn.Modelu to define the graph. First, create def \ __ init \ __ (self) by initialization and call \ __ init \ __ of nn.Module. After that, create a layer with self. (Name of layer). This time as explained in the image

1st layer (input = 2, output = 4) 2nd layer (input = 4, output = 4) 3rd layer (input = 4, output = 1)

Network configuration.

In the program self. (Layer name) = nn.Linear (number of inputs, number of outputs) </ strong>

Let's write. nn.Linear </ strong> is a module called fully connected </ font> that is used when creating a graph in which the nodes of the input layer propagate to all the nodes of the next layer. ..

The forward function describes the behavior of the neural network when there is an actual input. In the first line, enter the argument "x" in the first layer and perform the activation function ReLU. The second line inputs the output "y" of the first layer to the second layer and applies the ReLU function. Finally, input to the final layer and return the output result.

Take a look at the network overview

net = Net()
print(net)
Net(
  (fc1): Linear(in_features=2, out_features=4, bias=True)
  (fc2): Linear(in_features=4, out_features=4, bias=True)
  (fc3): Linear(in_features=4, out_features=1, bias=True)
)

You can confirm that the network has been built firmly.

By the way, you can also see the initial value of the network weight.

for param_tensor in net.state_dict():
    print(param_tensor, "\t", net.state_dict()[param_tensor].size())
fc1.weight 	 torch.Size([4, 2])
fc1.bias 	 torch.Size([4])
fc2.weight 	 torch.Size([4, 4])
fc2.bias 	 torch.Size([4])
fc3.weight 	 torch.Size([1, 4])
fc3.bias 	 torch.Size([1])

There is a "bias" here, which is called a bias and is added to the calculation of each layer.

2x+3

The "3" in the above linear function is the bias. In mathematical terms, it is a intercept </ strong>.

Finally This time, I gave a brief explanation of the neural network and built the neural network with Pytorch. There was a part that I made a figure and explained, but I think there was a point that I did not understand. Please contact me if you have any questions.

Next time, I will use the neural network construction method I learned this time to build a more practical neural network. Specifically, we will solve the "OR circuit" and "AND circuit" of the logic circuit with a neural network. Thank you for reading until the end.

Recommended Posts