This is the 6th installment of PyTorch Official Tutorial following Last time. This time, we will proceed with What is torch.nn really?.
What is torch.nn really?
This tutorial describes torch.nn, torch.optim, Dataset, and DataLoader. (Although torch.nn and torch.optim were explained last time, there are some overlaps because various people have written tutorials.)
The dataset used is the MNIST dataset. The MNIST dataset is a dataset of handwritten digit images from 0 to 9. To better understand, first build the model without using the above packages. Next, we will proceed in the order of torch.nn, torch.optim, Dataset, DataLoader, replacing the code one by one.
First, download the MNIST dataset (handwritten digit image dataset).
from pathlib import Path
import requests
DATA_PATH = Path("data")
PATH = DATA_PATH / "mnist"
PATH.mkdir(parents=True, exist_ok=True)
URL = "http://deeplearning.net/data/mnist/"
FILENAME = "mnist.pkl.gz"
if not (PATH / FILENAME).exists():
content = requests.get(URL + FILENAME).content
(PATH / FILENAME).open("wb").write(content)
This dataset is a numpy array. It is saved in pickle format.
import pickle
import gzip
with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")
One data (x_train [0]) is a 28x28 size image, but it is held as one row with 784 columns. To view it with pyplot.imshow, you need to convert it to 28x28.
from matplotlib import pyplot
import numpy as np
pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")
print(x_train.shape)
out
(50000, 784)
From now on, we will use PyTorch's Tensor. Convert from a numpy array to a Tensor.
import torch
x_train, y_train, x_valid, y_valid = map(
torch.tensor, (x_train, y_train, x_valid, y_valid)
)
n, c = x_train.shape
x_train, x_train.shape, y_train.min(), y_train.max()
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())
out
tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]) tensor([5, 0, 4, ..., 8, 4, 8])
torch.Size([50000, 784])
tensor(0) tensor(9)
You can see that the number of training data is 50,000 (× 784), and the teacher data is a number from 0 to 9.
First, create a neural network only with Tensor without using torch.nn. The model to be created is a simple linear model $ y = w \times x + b$ is.
Initialize the weight $ w $ with PyTorch's random method randn. randn is a standardized (0 mean, 1 standard deviation) random value. Since we don't want to include the gradient when initializing, we do require_grad_ () after initialization and set requires_grad = True. "Xavier initialization" is used for weight initialization. (There is, but I feel that the calculation formula is a little different) Bias is initialized to zero.
import math
weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)
We also need an activation function, so create a log_softmax function. PyTorch provides many loss and activation functions, but you can also create your own in this way.
def log_softmax(x):
return x - x.exp().sum(-1).log().unsqueeze(-1)
def model(xb):
return log_softmax(xb @ weights + bias)
@ Represents the dot product operation. This function is called in batch size (64 images this time).
bs = 64 # batch size
xb = x_train[0:bs] #Mini batch
preds = model(xb) #Expect with model
preds[0], preds.shape
print(preds[0], preds.shape)
out
tensor([-2.8486, -2.2823, -2.2740, -2.7800, -2.1906, -1.3280, -2.4680, -2.2958,
-2.8856, -2.8650], grad_fn=<SelectBackward>) torch.Size([64, 10])
If you output the predicted value preds, you can see that the Tensor contains a gradient function (grad_fn). We will later use this gradient function to calculate the backpropagation. Implement the negative log-likelihood of the teacher data and the predicted value as a loss function. Negative log-likelihood is commonly referred to as the cross-entropy error function.
def nll(input, target):
return -input[range(target.shape[0]), target].mean()
loss_func = nll
Calculate the loss with the predicted values and the teacher data and check the parameters after training.
yb = y_train[0:bs]
print(loss_func(preds, yb))
out
tensor(2.4101, grad_fn=<NegBackward>)
It also implements an evaluation function that calculates the accuracy of the model. Since the probabilities of handwritten numbers 0 to 9 are held in an array in out, the maximum value of argmax is the handwritten number with the highest probability. The correct answer rate is calculated by taking the matching average of the value and the teacher data.
def accuracy(out, yb):
preds = torch.argmax(out, dim=1)
return (preds == yb).float().mean()
print(accuracy(preds, yb))
out
tensor(0.0781)
Now you are ready to learn. Repeat the following to learn.
--Acquire training data in mini-batch units. --Use a model to make predictions from training data. --Calculate the loss. --Update the model gradient (weights and biases) with loss.backward ().
After updating the weights and biases, I'm initializing the gradient with grad.zero_ (). This is because when you calculate the gradient with loss.backward (), it will be added to what is already saved.
from IPython.core.debugger import set_trace
lr = 0.5 # learning rate
epochs = 2 # how many epochs to train for
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
#set_trace()
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
You can see that the accuracy has improved after learning.
print(loss_func(model(xb), yb), accuracy(model(xb), yb))
out
tensor(0.0822, grad_fn=<NegBackward>) tensor(1.)
Before learning, the correct answer rate was 7%, but after learning, it is 100%.
Now you have a simple neural network built from scratch. This network using the softmax function without a hidden layer is called logistic regression.
From here, we'll use PyTorch's nn package to refactor our code. In the first step, let's replace the activation function and the loss function. torch.nn.functional has F.cross_entropy, which combines the log_softmax function with a negative log-likelihood. Replace the loss function with F.cross_entropy. Since F.cross_entropy includes the log_softmax function, you can also remove the def log_softmax (x) defined as the activation function.
import torch.nn.functional as F
loss_func = F.cross_entropy
def model(xb):
return xb @ weights + bias
The log_softmax that was called by model is no longer needed. (Included in cross_entropy) Make sure the loss and accuracy are the same as before.
print(loss_func(model(xb), yb), accuracy(model(xb), yb))
out
tensor(0.0822, grad_fn=<NllLossBackward>) tensor(1.)
Next, we will refactor using nn.Module and nn.Parameter. nn.Module is the base class for Pytorch's neural networks. Implement nn.Module as a subclass. Define the weight and bias parameters in the subclass you created. In addition, describe the process to connect from input to output in order in the forward method. nn.Module also comes with parameters (), which returns model parameters.
from torch import nn
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
self.bias = nn.Parameter(torch.zeros(10))
def forward(self, xb):
return xb @ self.weights + self.bias
Since we are using objects instead of functions, we need to instantiate the model first.
model = Mnist_Logistic()
Now you can learn as you did before the refactoring. The nn.Module object can be called and used like a function.
print(loss_func(model(xb), yb))
out
tensor(2.3918, grad_fn=<NllLossBackward>)
In the implementation so far, the weight and bias updates were calculated respectively as follows, and the gradient was manually set to zero.
with torch.no_grad():
weights -= weights.grad * lr
bias -= bias.grad * lr
weights.grad.zero_()
bias.grad.zero_()
Weight and bias updates can be simplified by replacing them with parameters () and zero_grad () defined in nn.Module.
#Cannot be executed because it is an explanatory code (a run-time error will occur)
with torch.no_grad():
for p in model.parameters(): p -= p.grad * lr
model.zero_grad()
Define the learning loop as a fit function so that it can be called.
def fit():
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
with torch.no_grad():
for p in model.parameters():
p -= p.grad * lr
model.zero_grad()
fit()
Let's reconfirm that the loss is reduced.
print(loss_func(model(xb), yb))
out
tensor(0.0796, grad_fn=<NllLossBackward>)
I first defined weights and bias myself and implemented the linear function $ w \ times x + b $, but let's replace it with nn.Linear (linear layer).
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10)
def forward(self, xb):
return self.lin(xb)
Instantiate the model as before and calculate the loss.
model = Mnist_Logistic()
print(loss_func(model(xb), yb))
out
tensor(2.3661, grad_fn=<NllLossBackward>)
Learn by calling a functionalized fit.
fit()
print(loss_func(model(xb), yb))
out
tensor(0.0813, grad_fn=<NllLossBackward>)
The loss value has changed from 2.3661 to 0.0813, confirming that learning is possible.
Then refactor the optimization algorithm. There are various optimization algorithms in Pytorch's torch.optim package. Also, each class in torch.optim updates the parameters by executing the step method instead of updating the parameters manually.
with torch.no_grad():
for p in model.parameters(): p -= p.grad * lr
model.zero_grad()
You can rewrite the above code as follows.
#Cannot be executed because it is a descriptive code
opt.step()
opt.zero_grad()
from torch import optim
Functionalizing model and optimizer generation simplifies the code.
def get_model():
model = Mnist_Logistic()
return model, optim.SGD(model.parameters(), lr=lr)
model, opt = get_model()
print(loss_func(model(xb), yb))
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
out
tensor(2.3423, grad_fn=<NllLossBackward>)
tensor(0.0819, grad_fn=<NllLossBackward>)
PyTorch has an abstract Dataset class.
Dataset makes it easier to handle training data (x_train) and teacher data (y_train) during training.
The Dataset must implement a __len__
function that returns the number of elements and a __getitem__
function that returns elements by specifying an index.
TensorDataset wraps the dataset in a Tensor.
from torch.utils.data import TensorDataset
Create TensorDataset by specifying x_train and y_train when creating it.
train_ds = TensorDataset(x_train, y_train)
Previously, the training data (x_train) and the teacher data (y_train) were iteratively processed separately.
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
You can use TensorDataset to process all at once.
xb,yb = train_ds[i*bs : i*bs+bs]
model, opt = get_model()
for epoch in range(epochs):
for i in range((n - 1) // bs + 1):
xb, yb = train_ds[i * bs: i * bs + bs]
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
out
tensor(0.0803, grad_fn=<NllLossBackward>)
DataLoader can be used to simplify looping with Datasets. Create a DataLoader based on the Dataset.
from torch.utils.data import DataLoader
train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs)
In the first code, we specified the start position for each batch size and sliced it.
for i in range((n-1)//bs + 1):
xb,yb = train_ds[i*bs : i*bs+bs]
pred = model(xb)
DataLoader makes the loop simpler because (xb, yb) are automatically loaded sequentially from DataLoader.
for xb,yb in train_dl:
pred = model(xb)
model, opt = get_model()
for epoch in range(epochs):
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
print(loss_func(model(xb), yb))
out
tensor(0.0802, grad_fn=<NllLossBackward>)
So far, we have used nn.Module, nn.Parameter, Dataset, and DataLoader. I was able to write the code concisely and easily. Next, let's add the basic functions needed to create an effective model.
Up to this point, learning has proceeded only with training data, but in actual learning, validation data is used to check whether overfitting has occurred and whether learning has progressed. Set up the data set for validation below.
train_ds = TensorDataset(x_train, y_train)
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_ds = TensorDataset(x_valid, y_valid)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)
At the end of each epoch, the validation data is used to calculate the loss. Put model.train () into training mode before training and model.eval () into evaluation mode before validation. This is to enable nn.Dropout etc. only during training.
model, opt = get_model()
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()
model.eval()
with torch.no_grad():
valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)
print(epoch, valid_loss / len(valid_dl))
out
0 tensor(0.3679)
1 tensor(0.2997)
Next, create a function loss_batch that can do both training and validation. Passing the optimizer to loss_batch will calculate the backpropagation and update the parameters. Backpropagation is not calculated by not passing the optimizer during verification.
def loss_batch(model, loss_func, xb, yb, opt=None):
loss = loss_func(model(xb), yb)
if opt is not None:
loss.backward()
opt.step()
opt.zero_grad()
return loss.item(), len(xb)
Define the fit function. The fit function iterates training and validation on each epoch and displays the loss.
import numpy as np
def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
for epoch in range(epochs):
model.train()
for xb, yb in train_dl:
loss_batch(model, loss_func, xb, yb, opt)
model.eval()
with torch.no_grad():
losses, nums = zip(
*[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
)
val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)
print(epoch, val_loss)
get_data returns a DataLoader for training and validation data.
def get_data(train_ds, valid_ds, bs):
return (
DataLoader(train_ds, batch_size=bs, shuffle=True),
DataLoader(valid_ds, batch_size=bs * 2),
)
Now you can write the process of getting the DataLoader and performing the learning in three lines of code.
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
model, opt = get_model()
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
out
0 0.45953697173595426
1 0.3061695278286934
You can build different models by refactoring the three lines of code. Let's see if we can build a convolutional neural network (CNN)!
From here, we will build a neural network with three convolution layers. The functions created so far have no model restrictions, so you can switch to CNN without making any changes.
Use the Conv2d class provided by Pytorch as a convolution layer. Define the CNN with three convolution layers. The activation function for each convolution layer is ReLU. Finally, add an average pooling layer. (The view is a PyTorch version of a variant of numpy)
class Mnist_CNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)
def forward(self, xb):
xb = xb.view(-1, 1, 28, 28)
xb = F.relu(self.conv1(xb))
xb = F.relu(self.conv2(xb))
xb = F.relu(self.conv3(xb))
xb = F.avg_pool2d(xb, 4)
return xb.view(-1, xb.size(1))
lr = 0.1
Momentum is a variation of stochastic gradient descent that also takes into account the last update, which generally leads to faster training.
model = Mnist_CNN()
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
out
0 0.7808464194297791
1 0.6988550303936004
torch.nn has another handy class, Sequential, that you can use to simplify your code. The Sequential object executes each module contained therein in sequence. The feature is that you can easily describe the network.
Custom layers may be required to take advantage of Sequential. PyTorch doesn't have a layer to transform the dimensions of the network (layer), so you'll have to create your own view layer. The following Lambda defines the input / output layer handled by Sequential.
class Lambda(nn.Module):
def __init__(self, func):
super().__init__()
self.func = func
def forward(self, x):
return self.func(x)
def preprocess(x):
return x.view(-1, 1, 28, 28)
Sequential makes it easy to describe your network as follows:
model = nn.Sequential(
Lambda(preprocess),
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AvgPool2d(4),
Lambda(lambda x: x.view(x.size(0), -1)),
)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
out
0 0.4288556560516357
1 0.2115058801174164
The CNN I created is fairly concise, but it only works with MNIST data (handwritten digit images) due to the following restrictions:
--The input must be 28 * 28 data. --The final grid size is assumed to be 4 * 4 (because it uses 2D average pooling with kernel size 4).
Remove these two assumptions so that the model works with any 2D single-channel image (monochromatic image). First, delete the first Lambda layer and move the data preprocessing to the DataLoader.
def preprocess(x, y):
return x.view(-1, 1, 28, 28), y
class WrappedDataLoader:
def __init__(self, dl, func):
self.dl = dl
self.func = func
def __len__(self):
return len(self.dl)
def __iter__(self):
batches = iter(self.dl)
for b in batches:
yield (self.func(*b))
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)
Then replace nn.AvgPool2d with nn.AdaptiveAvgPool2d. This allows you to define the size of the output tensor you want, not the input tensor. As a result, the average pooling layer works with inputs of any size.
model = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
Lambda(lambda x: x.view(x.size(0), -1)),
)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
Let's try it.
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
out
0 0.3351769802570343
1 0.2583931807518005
If you have a CUDA-enabled GPU available (most cloud providers cost about $ 0.50 per hour), you can speed up your learning. First, make sure your GPU is running on Pytorch.
print(torch.cuda.is_available())
out
True
Next, create a device object. The device object is set to "cuda" if the GPU is available and "cpu" if it is not available.
dev = torch.device(
"cuda") if torch.cuda.is_available() else torch.device("cpu")
Add preprocessing to move the batch to the GPU.
def preprocess(x, y):
return x.view(-1, 1, 28, 28).to(dev), y.to(dev)
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
train_dl = WrappedDataLoader(train_dl, preprocess)
valid_dl = WrappedDataLoader(valid_dl, preprocess)
Finally, move the model to the GPU.
model.to(dev)
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
You can see that the processing speed has increased.
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
out
0 0.1938392831325531
1 0.18594802458286286
When I checked with Google Colaboratory, the above process, which took about 15 seconds for the CPU, was completed in about 5 seconds.
In this tutorial, you have created model-independent data processing and training processing. There are many things we want to add, such as data augmentation, hyperparameter tuning, monitoring training, and transfer learning. These features are available in the fastai library. The fastai library was developed using the same design approach shown in this tutorial and will be a good step for anyone further learning machine learning.
For this tutorial, we used torch.nn, torch.optim, Dataset, and DataLoader. Let's summarize what we have seen so far.
Module
: An object that can be called like a function and holds the neural network layer and weights. It also holds a parlor meter and performs iterative operations to calculate gradients and update weights. Parameter
: A wrapper for the tensor that tells the module that there are weights that need to be updated during the gradient calculation. Only tensors with the require_grad attribute set will be updated Functional
: Module containing layers such as activation function, loss function and convolutional layer and linear layer
-** torch.optim : Includes an optimizer such as SGD (Stochastic Gradient Descent) to update parameter weights during gradient calculation
- Dataset : Abstract interface with len and getitem, including classes such as TensorDataset
- DataLoader **: Create an iterator that takes an arbitrary Dataset and returns a batch of dataThat's "What is torch.nn really?" It was similar to the last time, but I was able to deepen my understanding of Pytorch and neural networks. Next time, I would like to proceed with "Visualizing Models, Data, and Training with TensorBoard".
2020/10/10 First edition released
Recommended Posts