[PYTHON] Practice Pytorch

at first

This article is a memorandum on how to define a model by Pytorch, which is one of the deep learning frameworks, how to learn, and how to create a self-made function.

Installation

Pytorch can be easily installed with conda or pip. If you select the version of os, python, cuda from here, the suitable script will be displayed, so you can install it by copying it. (Cuda and cudnn need to be set up separately) Currently, it seems that it is only linux or osx and does not support windows. (Since 0.4, windows is also officially supported.)

Official tutorial

If you are going to use Pytorch from now on, Official Tutorials is much easier to understand than this article, so you should refer to it. Also, example is very helpful. Others If you look at Documents and Forums, most of the things you don't understand will be solved.

Practice

Pytorch basically uses torch.Tensor for matrix operations. The usage is basically the same as torch.Tensor of Torch7. However, unlike Torch7, the input to the model is premised on the input in mini-batch. In the case of 2D convolution, there was no problem with either 3D or 4D input in Torch7, but 4D is a prerequisite for Pytorch. Also, when calculating using model, change torch.Tensor to Variable and use it. In Pytorch 0.4 or earlier, if the following is imported, you can perform basic model definition and training.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

Since Pytorch 0.4 and later, Variable is integrated with torch.Tensor, so you don't need to import Variable.

Model definition

In Pytorch, it is possible to define a model in the same way as Torch7 as follows.

model1.


model = nn.Sequential()
model.add_module('fc1', nn.Linear(10,100))
model.add_module('relu', nn.ReLU())
model.add_module('fc2', nn.Linear(100,10))

The difference is that you give each layer a name. It is also possible to put a layer in list and plunge into nn.Sequential ().

model2.py


layer = []
layer.append(nn.Linear(10,100))
layer.append(nn.ReLU())
layer.append(nn.Linear(100,10))
model = nn.Sequential(*layer)

It can also be defined as a class as follows.

model3.py


import torch.nn.functional as F
class Model(nn.Module):
  def __init__(self):
    super(Model,self).__init__()
    self.fc1 = nn.Linear(10,100)
    self.fc2 = nn.Linear(100,10)

  def forward(self,x):
    x = self.fc1(x)
    x = F.relu(x)
    x = self.fc2(x)
    return x

If you've used chainer before, I think it's a familiar definition method. In Pytorch, it is possible to reduce the amount of model definition and the amount of forward description by making full use of nn.Sequential. Implementation example of Pytorch's torchvision package can be used as a reference for how to build a model. We will build a model by using or combining the above three according to the purpose.

One of the most common mistakes you'll make when you start using pytorch is to keep layers in a list. When using the same hyperparameter layer many times, it is easy to define it using a for statement, but if you keep the learnable parameters in a list, you can keep them in a list when you call the parameters of the model. The parameters of the layer are not recognized as parameters and are not called. What happens is that the parameters are not updated during learning. The following is a bad example.

list.py


class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.layer = [nn.Linear(10,10) for _ in range(10)]

  def forward(self, x):
    for i in range(len(self.layer)):
      x = self.layer[i](x)
    return x

model = Model()
# model.parameters()You can get the iterator of the learning parameters with
#If you hold it in list, you cannot get the parameters of the module in list.
#More on optim later
optimize = optim.SGD(model.parameters(), lr=0.1)

Therefore, in such a case, define it using nn.ModuleList.

modulelist.py


class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    layer = [nn.Linear(10,10) for _ in range(10)]
    self.layer = nn.ModuleList(layer)

  def forward(self, x):
    for i in range(len(self.layer)):
      x = self.layer[i](x)
    return x

model = Model()
# model.parameters()You can get the iterator of the learning parameters with
#If you hold it in list, you cannot get the parameters of the module in list.
#More on optim later
optimize = optim.SGD(model.parameters(), lr=0.1)

Be careful because you are addicted to it.

GPU usage

It can be calculated on GPU by defining model and variables as cudatensor as shown below.

gpu.py


import torch

x = torch.randn(10)
y = torch.randn(10)

"""
Pytorch 0.4 or earlier
x = x.cuda()
y = y.cuda(0) #If you put a number in the argument, you can use the GPU with the id corresponding to the number

z = x * y #The calculation is done on the GPU.

z = z.cpu() #to cpu
"""

"""
Pytorch 0.4 or later
x = x.to('cuda')
y = y.to('cuda:0') #After cuda:Use GPU that supports numbers

z = x * y

z = z.to('cpu') #to cpu
"""

print(x.is_cuda) #True if the variable is on the GPU

Fine tuning

With torchvision.models, Pytorch makes it easy to define AlexNet, VGGNet, ResNet, DenseNet, SqueezeNet, GoogleNet, and use these trained models.

get_model.py


import torchvision.models as models
alexnet = models.alexnet()
pretrain_alexnet = models.alexnet(pretrained=True) #Trained models can be downloaded by setting the Pretrained option to True

In addition, fine tuning can be performed by changing the number of dimensions of the output as shown below.

finetune1.py


resnet = models.resnet50(pretrained=True)
resnet.fc = nn.Liear(2048, 100)

Also, if you want to use only some layers, you can write as follows.

finetune2.py


resnet = models.resnet50(pretrained=True)
resnet = nn.Sequential(*list(resnet.children())[:-3])

It is possible to take out any layer using slices. Here is an example of changing resnet's Global Average Pooling to Max Pooling to make the output 10-dimensional.

resnet_finetune.py


class Resnet(nn.Module):
  def __init__(self):
    super(Resnet,self).__init__()
    resnet = models.resnet50(pretrained=True)
    self.resnet = nn.Sequential(*list(resnet.children())[:-2])
    self.maxpool = nn.MaxPool2d(kernel_size=7)
    self.fc = nn.Linear(2048, 10)

  def forward(self,x):
    x = self.resnet(x)
    x = self.maxpool(x)
    x = self.fc(x)
    return x

Learning

Use the optim package to update the parameters using any optimization technique. After setting the parameters of the optimization method, you can update by calling step () every time you perform a backward calculation.

update.py


"""
Pytorch 0.4 or later
if torch.cuda.is_available(): #Check if GPU is available
  device = 'cuda'
else:
  device = 'cpu'
"""
#definition of model
model = models.resnet18()

"""
Pytorch 0.4 or later
model = model.to(device)
"""
#Parameter setting of optimization method
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
#definition of loss function
criterion = nn.MSELoss()
#Generate input and correct answer with random numbers
input = torch.randn(1,3,224,224) #Batch x channel x height x width
target = torch.randn(1,1000)

"""
# Pytorch 0.4 or earlier
#Change to Variable and calculate
input = Variable(input)
"""

# requires_By setting grad to False, it is possible not to calculate the gradient for that variable.
#It is set to False by default, so there is no need to write it explicitly like this time.
"""
Pytorch 0.4 or earlier
target = Variable(target, requires_grad=False)
"""

"""
Pytorch 0.4 or later
target.requires_grad = False
"""

#The behavior of modules that behave differently during learning and inference, such as batch normalization, can be changed to the behavior during learning.
# model.eval()Can be changed to the behavior at the time of inference with
model.train()
#Learning loop
for i in range(100):
  #Forward propagation
  out = model(input)
  #Loss calculation
  loss = criterion(out, target)
  #Gradient initialization
  optimizer.zero_grad()
  #Gradient calculation
  loss.backward()
  #Parameter update
  optimizer.step()

When not using optim

If optim is not used, the parameters can be updated by rewriting the part of optimizer.step () as follows.

update_without_optim.py


for param in model.parameters():
    param.data -= learning_rate * param.grad.data

For RNN

In the case of RNN, it is possible to perform back propagation by adding loss as shown below and calling step () at the update timing.

rnn_update.py


class RNN(nn.Module):
  def __init__(self, data_size, hidden_size, output_size):
    super(RNN, self).__init__()
    self.hidden_size = hidden_size
    input_size = data_size + hidden_size
    self.i2h = nn.Linear(input_size, hidden_size)
    self.h2o = nn.Linear(hidden_size, output_size)

  def forward(self, data, last_hidden):
    input = torch.cat((data, last_hidden), 1)
    hidden = self.i2h(input)
    output = self.h2o(hidden)
    return hidden, output

RNN = RNN()
#Omitted such as optimizer
for i in range(10):
  hidden, output = RNN(input, hidden)
  loss += criterion(output, target)
loss.backward()
optimizer.step()

In the above code, the loss is calculated and the parameters are updated after 10 steps.

Save model

Basically, it can be saved with torch.save () like Torch7, but only the learning parameters are saved using state_dict ().

model_save.py


model = models.resnet50(pretrained=True)
#Save model
torch.save(model.state_dict(), 'weight.pth')
model2 = models.resnet50()
#Reading parameters
param = torch.load('weight.pth')
model2.load_state_dict(param)

Not only the model but also the optimizer can be saved using torch.save () and state_dict () as well.

optimizer_save.py


optimizer = optim.SGD(model.parameters(), lr=0.1)
torch.save(optimizer.state_dict(), 'optimizer.pth')

optimizer2 = optim.SGD(model.parameters(), lr=0.1)
optimizer2.load_state_dict(torch.load('optimizer.pth')

Self-made function by Numpy

Pytorch can easily create original layers and functions using Numpy. (You can also write in C) Just inherit the Function class and write forward and backward calculations. For example, if you create your own ReLU function, it will be as follows. (The relu function is originally implemented so you don't have to write it yourself)

relu.py


from torch.autograd import Function

class relu(Function):
  def forward(self,x):
    # torch.From Tensor to numpy
    numpy_x = x.numpy()
    result = np.maximum(numpy_x,0)
    #from numpy to torch.To Tensor
    result = torch.FloatTensor(result)
    #Hold Tensor for backward calculation
    self.save_for_backward(result)
    return result

  def backward(self, grad_output):
    result = self.saved_tensors[0]
    grad_input = grad_output.numpy() * (result.numpy() > 0)
    #Returns the gradient with respect to the input
    return torch.FloatTensor(grad_input)

If there is a parameter to be learned, it is necessary to bite Parameter into the learning parameter and define a class that inherits nn.Module. We will implement an operation that just weights the input as an example. (When x = [1,2,3], w = [0.1,0.2,0.3], the output is [0.1,0.4,0.9]. W is a learnable parameter)

elemwise.py


from torch.autograd import Function
from torch.nn.parameter import Parameter
class elemwiseFunction(Function):
  def forward(self, x, w):
    self.save_for_backward(x, w)
    numpy_x = x.numpy()
    numpy_w = w.numpy()
    result = numpy_x*numpy_w
    return torch.FloatTensor(result)

  def backward(self, grad_output):
    input, w = self.saved_tensors
    w_grad = input.numpy() * grad_output
    x_grad = w.numpy() * grad_output
    #Returns the gradient for the input and the gradient for the learning parameters
    return torch.FloatTensor(x_grad), torch.FloatTensor(w_grad)

class elemwise(nn.Module):
  def __init__(self):
    super(elemwise,self).__init__()
    self.w = Parameter(torch.randn(10)

  def forward(self):
    return elemwiseFunction()(x, self.w)

The above code does not need to use numpy as the input / output of forward and backward should be torch.Tensor. For example, you can write using cupy, or you can write using extreme theory or other libraries. In addition, since Pytorch has automatic differentiation, it is not necessary to write backward by completing the operation of torch.Tensor without using numpy in the above operation.

elemwise_without_backward.py


class elemwise(nn.Module):
  def __init__(self):
    super(elemwise,self).__init__()
    self.w = Parameter(torch.randn(10)

  def forward(self, x):
    return x * self.w

The following is a reference site for how to define your own layers in C or C ++ and how to define functions using the CUDA kernel. Official tutorial on how to extend with C, Official implementation with C, Official implementation in C ++, Official tutorial on extensions in C ++ and cuda kernels ), Pytorch source code In the source code, TH ,, THS, THC, and THCS have implementations related to torch.Tensor, and THNN and THCUNN have implementations related to neural networks.

Other

Below are some tips to help you learn NN. I would like to add it every time I remember something.

How to not make a calculation graph

pytorch performs calculations while building a calculation graph to backward during forward propagation. This is not necessary during inference and it is recommended to stop it as follows to save memory.

no_grad.py



import torch
x = torch.randn(10)
"""
Pytorch 0.4 or earlier
"""
from torch.autograd import Variable
x = Variable(x, volatile=True) #Set volatile option to True
y = x**2


"""
Pytorch 0.4 or later
"""
with torch.no_grad():
  #Execute operations that you do not want to create a calculation graph in with
  y = x**2

Calculate the average loss

You may want to calculate the average loss to see the learning process. At that time, if you simply add up the losses of each iteration, there is a problem that the calculation graph continues to be created and the memory is consumed, so write as follows.

mean_loss.py


sum_loss = 0

#Learning loop
for i in range(100):
  """
Description such as forward propagation
  """
  loss = loss_function(outputs, targets) #Calculate loss with an appropriate loss function

  """
  Pytorch 0.4 or earlier
  """
  #Variable to torch by data.To Tensor. Furthermore, change from gpu to cpu and specify the 0th index
  sum_loss += loss.data.cpu()[0] 

  """
  Pytorch 0.4 or later
  """
  sum_loss += loss.item()

print("mean loss: ", sum_loss/i) 

The operation was troublesome before Pytorch 0.4, but it became simpler by calling item () after 0.4.

Recommended Posts

Practice Pytorch
numpy practice 1
Install pytorch
PyTorch Links
Install PyTorch
[Pytorch] MaxPool2d ceil_mode
Practice RNN TensorFlow
[PyTorch] Installation method
Play with PyTorch
[Practice] TCP programming
Reactive Extensions practice
Cross-validation with PyTorch
Beginning with PyTorch
Beginners practice Python
Notes about pytorch
[PyTorch] Sample ② ~ TENSOR ~
[PyTorch] Sample ① ~ NUMPY ~
PyTorch learning template