[PYTHON] [PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 1)

Introduction

This is the 5th installment of PyTorch Official Tutorial following Last time. This time, we will proceed with Learning PyTorch with Examples.

Learning PyTorch with Examples

This tutorial will show you two main features of PyTorch through sample code.

The network (model) handled by the sample code is 3 layers (input layer, hidden layer x 1, output layer). The activation function uses ReLU.

  1. Tensor

1.1. Warm-up: numpy

Before PyTorch, first implement the network using numpy. Numpy doesn't have features for deep learning, gradients, You can build a simple neural network by implementing it manually.

import numpy as np

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

#Initialize the weight with a random value
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    #Forward propagation:Calculates the predicted value y with the current weight value
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    #Calculates and outputs the loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    #With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    #Update the weight.
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

When you run this code, you can see that the loss value is reduced and the learning is progressing.

1.2. PyTorch: Tensors

Numpy can't be calculated using the GPU, but PyTorch's Tensor can use the GPU to speed up numerical calculations. The tensor can also calculate the gradient, but for now, let's implement it manually, as in the numpy example above.

import torch


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Initialize the weight with a random value
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    #Forward propagation:Calculates the predicted value y with the current weight value
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    #Calculates and outputs the loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    #With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    #Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

Even with this code, you can see that the loss value has decreased and the learning is progressing.

  1. Autograd

2.1. PyTorch: Tensors and autograd

In the example above, we manually implemented forward and backpropagation, but you can use PyTorch's autograd package to automate the backpropagation calculation.

-Set requires_grad = True for the variable (Tensor) for which you want to calculate the gradient. ・ Execute backward () These two can automate the backpropagation calculation.

import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create a random tensor to hold the input and teacher data.
# require_grad =Set to False to indicate that the gradient does not need to be calculated.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
# requires_grad =Setting True indicates that the gradient will be calculated.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    #Forward propagation:Calculate the predicted value y using the Tensor operation
    #Median value h because backpropagation is not calculated manually_relu does not need to be retained
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    #Calculate and display losses using Tensor operations
    #Loss is shape (1,) Tensor
    # loss.item()Gets the scalar value held in the loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    #Use autograd to calculate backpropagation
    # backward()Requires_grad =Calculates the loss gradient for all True Tensors
    #After this call, w1.grad and w2.grad is w1 respectively,Will be a Tensor that holds the gradient of w2
    loss.backward()

    #Manually update the weights using the steepest descent method
    #Require for weight_grad =Because there is True, torch.no_grad()Prevents the calculation graph from being updated with
    # torch.optim.You can do the same with SGD
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        #After updating the weights, manually set the gradient to zero
        w1.grad.zero_()
        w2.grad.zero_()

Although not in the tutorial, let's illustrate the backpropagation calculation graph. Calculation graphs can be illustrated by using torchviz. If you are using colaboratory, you need to install it.

!pip install torchviz

PyTorch: A little tweak to the Tensors sample code. Stop the loop so that the gradient is calculated only once.

#Create random input data and teacher data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Initialize the weight with a random value
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#Forward propagation:Calculates the predicted value y with the current weight value
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)

#Calculates and outputs the loss
loss = (y_pred - y).pow(2).sum().item()

#With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)

Diagram the calculation graph with make_dot of torchviz. Illustrates forward propagation and gradient. param_dict is not required, but it allows you to write variable names in the diagram.

#The calculation graph of forward propagation is illustrated.
from torchviz import make_dot
param_dict = {'w1': w1, 'w2': w2}
make_dot(loss, param_dict)

#The calculation graph of the gradient of w1 is shown.
make_dot(grad_w1, param_dict)

#The calculation graph of the gradient of w2 is illustrated.
make_dot(grad_w2, param_dict)

The calculation graph is below.

PyTorch_Tensors_make_dot.png

Similarly, modify the sample code in PyTorch: Tensors and autograd so that the gradient is calculated only once. Specifying create_graph = True at runtime () preserves the derivative graph.

import torch

#Create a random tensor to hold the input and teacher data.
# require_grad =Set to False to indicate that the gradient does not need to be calculated.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
# requires_grad =Setting True indicates that the gradient will be calculated.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#Forward propagation:Calculate the predicted value y using the Tensor operation
#Median value h because backpropagation is not calculated manually_relu does not need to be retained
y_pred = x.mm(w1).clamp(min=0).mm(w2)

#Calculate and display losses using Tensor operations
#Loss is shape (1,) Tensor
# loss.item()Gets the scalar value held in the loss
loss = (y_pred - y).pow(2).sum()

#Use autograd to calculate backpropagation
# backward()Requires_grad =Calculates the loss gradient for all True Tensors
#After this call, w1.grad and w2.grad is w1 respectively,Will be a Tensor that holds the gradient of w2
loss_backward = loss.backward(create_graph=True)

Similarly, the gradient calculated by forward propagation and autograd is illustrated.

#The calculation graph of forward propagation is illustrated.
param_dict = {'w1': w1, 'w2': w2}
make_dot(loss, param_dict)

#The calculation graph of the gradient of w1 is shown.
make_dot(w1.grad, param_dict)

#The calculation graph of the gradient of w2 is illustrated.
make_dot(w2.grad, param_dict)

Forward propagation is the same. The backpropagation has a slightly different shape, but you can see that the backpropagation calculation is done automatically by autograd.

PyTorch_Tensorsandautograd_make_dot.png

2.2. PyTorch: Defining new autograd functions

In PyTorch, you can define your own function (operator) by defining a subclass of torch.autograd.Function. Implement the following two methods in the subclass.

In this example, we define a two-tier network with our own function, which means the ReLU function.

import torch

class MyReLU(torch.autograd.Function):
    """
    torch.autograd.Subclass Function and
By implementing forward and backward paths that work with Tensors,
You can implement your own custom autograd function.
    """

    @staticmethod
    def forward(ctx, input):
        """
The forward pass receives the Tensor containing the input and
Returns a Tensor containing the output.
ctx is an object for backpropagation calculations.
        ctx.save_for_Using the backward method
You can cache the object.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
In the backward, we receive a Tensor that contains the gradient of the loss with respect to the output.
You need to calculate the loss gradient for the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create a random tensor to hold the input and teacher data.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    #To apply a function, Function.Use the apply method.
    relu = MyReLU.apply

    #Forward propagation:Calculate the predicted value y using a custom autograd function
    y_pred = relu(x.mm(w1)).mm(w2)

    #Calculate and display the loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    #Use autograd to calculate backpropagation
    loss.backward()

    #Update weights using steepest descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        #After updating the weights, manually set the gradient to zero
        w1.grad.zero_()
        w2.grad.zero_()

Let's also visualize the original function. As before, make sure it is processed only once.

#Create a random tensor to hold the input and teacher data.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#To apply a function, Function.Use the apply method.
relu = MyReLU.apply

#Forward propagation:Calculate the predicted value y using a custom autograd function
y_pred = relu(x.mm(w1)).mm(w2)

#Calculate and display the loss
loss = (y_pred - y).pow(2).sum()

#Use autograd to calculate backpropagation
loss.backward(create_graph=True)

PyTorch_Definingnewautogradfunctions_make_dot.png

Does it have a similar calculation graph?

Continue

Now that it's long, I'd like to split PyTorch: nn into the second part.

History

2020/05/27 First edition released

Recommended Posts

[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 2)
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 1)
[Examples of improving Python] Learning Python with Codecademy
Predict power demand with machine learning Part 2
I tried implementing DeepPose with PyTorch PartⅡ
Machine learning with Pytorch on Google Colab
Report_Deep Learning (Part 2)
Report_Deep Learning (Part 1)
Play with PyTorch
Report_Deep Learning (Part 2)
Cross-validation with PyTorch
Beginning with PyTorch
PyTorch learning template
Machine learning starting with Python Personal memorandum Part2
Machine learning starting with Python Personal memorandum Part1
Machine learning tutorial summary
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Use RTX 3090 with PyTorch
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Learning Python with ChemTHEATER 05-1
[PyTorch Tutorial ①] What is PyTorch?
Learning Python with ChemTHEATER 02
[PyTorch Tutorial ⑦] Visualizing Models, Data, And Training With Tensorboard
Install torch-scatter with PyTorch 1.7
[PyTorch Tutorial ③] NEURAL NETWORKS
FastAPI Tutorial Memo Part 1
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
Feature Engineering for Machine Learning Beginning with Part 3 Google Colaboratory-Scaling
Image processing with Python (Part 2)
Machine learning learned with Pokemon
Studying Python with freeCodeCamp part1
Bordering images with python Part 1
Learn librosa with a tutorial 1
Try deep learning with TensorFlow
Play with reinforcement learning with MuZero
Ensemble learning summary! !! (With implementation)
Try an autoencoder with Pytorch
Python: Supervised Learning: Hyperparameters Part 1
Reinforcement learning starting with Python
About learning with google colab
Machine learning with Python! Preparation
[PyTorch Tutorial ④] TRAINING A CLASSIFIER
Deep Kernel Learning with Pyro
Get started with Django! ~ Tutorial ④ ~
[PyTorch] Tutorial (Japanese version) ② ~ AUTOGRAD ~
Studying Python with freeCodeCamp part2
Implement PyTorch + GPU with Docker
Image processing with Python (Part 1)
Get started with Django! ~ Tutorial ⑥ ~
Linux fastest learning with AWS
Image processing with Python (Part 3)
AWS Lambda with PyTorch [Lambda import]
Python: Supervised Learning: Hyperparameters Part 2
[PyTorch] Tutorial (Japanese version) ① ~ Tensor ~
Scraping with Selenium + Python Part 2
[PyTorch Tutorial ②] Autograd: Automatic differentiation
Pytorch Neural Network (CNN) Tutorial 1.3.1.
Prediction of Nikkei 225 with Pytorch
Perform Stratified Split with PyTorch
Beginning with Python machine learning