pytorch An introduction to pytorch by beginners.
When using such a new framework, I think that the best shortcut is to look at the example and google around the functions used there, read the document, or play with it, so this is a memo.
If you're trying to get started with pytorch in a similar way, this article will save you time. (I'm glad if you become.)
So, in the code of cifar10-tutorial doing CNN with CIFAR10 Decoding or google work is done.
Installing pytorch is super easy, and if you click on your environment on the official website, the installation code will be displayed. In the case of my environment, it was as follows.
http://pytorch.org/
pip install http://download.pytorch.org/whl/cu80/torch-0.2.0.post3-cp36-cp36m-manylinux1_x86_64.whl
pip install torchvision
First, let's take a look at this code that loads and prepares the data.
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
torchvision is a package for pytorch's computer vision, and it seems that it contains functions for loading data and preprocessing. ..
In transforms.Compose
, configure the pre-processing function to be performed after loading the data.
As the name suggests, ToTensor ()
changes the type of data to a tensor called torch.Tensor
defined by pytorch.
Since the argument of transforms.Normalize
is torch.Tensor
, the functions will be executed in order from the beginning of the list.
[`` transforms.Normalize ((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)
`](http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms. In Normalize), the first tuple of the argument represents the average of each RGB channel, and the second tuple represents the standard deviation. Normalize according to these averages and standard deviations.
In other words
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
Now you have a function that transforms the data into a tensor type for pytorch and performs normalization.
torchvision.datasets.CIFAR10
is a class for loading CIFAR10 data as the name implies.
If download = True
, save the data in the root location.
For MNIST and CIFAR10, the dataset is already divided into training and testing. (I think you can mix it and divide it yourself)
In CIFAR10, there are 60000 images, of which 50000 are for training and 10000 are for testing. When train = True
is set, 50000 which is pre-trained is loaded, and when train = False
is set, 10000 test data is loaded.
If you pass a series of preprocessing flows created with transforms.Compose
with the argument transform
, the preprocessing passed after loading will be executed.
DataLoader puts sampler on the loaded dataset. /_modules/torch/utils/data/sampler.html) This is a class, which is an object for sampling data. If you check it, the sampler is certainly attached.
trainloader.sampler
# <torch.utils.data.sampler.RandomSampler at 0x7f9099e13ef0>
sampler apparently had random sampling, sequential sampling, weighted sampling, etc. ..
There is a sampler in the argument of DataLoader, so if you pass the sampler you defined here I have a good feeling, so let's try it.
Here, in order to make the result easy to understand, we will weight and sample only one piece of data. Since the number of data is small and easy, I will try it with test data.
import numpy as np
#Create a weight vector of 1 for only one image.
weights = np.zeros(10000) #The number of test data is 10000
weights[300] = 1. #300 is suitable
num_samples = 4 #Number of samplings
#Try WeightedRandomSampler.
my_sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, num_samples, replacement=True)
my_testloader = torch.utils.data.DataLoader(testset, batch_size=4,shuffle=False, num_workers=2, sampler=my_sampler)
my_testiter = iter(my_testloader)
images, labels = my_testiter.next()
#The imshow function will be explained next, but I will use it a little ahead.
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
imshow(torchvision.utils.make_grid(images))
Oh, it's just a frog.
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
img = img / 2 + 0.There is an unnormalize in the comment in the 5th part, but I feel that unnormalize is a little misleading. It is said that it is normalized.
Since the input of [plt.imshow](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.imshow.html) is [0,1], it is shifted accordingly.
## 3.7. What is img.numpy ()?
```python
type(img) # torch.FloatTensor
torch.Tensor is the type of tensor handled by pytorch and is named "element data type + Tensor". .. In this case, since it is a Float Tensor, it is a float, that is, a 32-bit floating point number.
Either the ndarray returned in `` `document``` and the original tensor share the same area. If you change it, the other one will also be changed.
Let's check it.
a = torch.FloatTensor([1])
b = a.numpy()
#Change ndarray
b[0] = 2
#The original tensor is also 2.
print("original tensor: ", a) # original tensor: 2
print("ndarray : ", b) # ndarray : [ 2.]
#It refers to another memory area.
print(id(a)) # 140024484781832
print(id(b)) # 140024044621056
Oh, the changes in the ndarray are also reflected in the original tensor. Since the reserved memory areas are different, it seems that they are operating as if they are virtually sharing the same area by keeping the values the same.
npimg = img.numpy()
npimg2 = np.transpose(npimg, (1, 2, 0))
print(npimg.shape) # (3, 36, 138)
print(npimg2.shape) # (36, 138, 3)
Looking at the documentation, the arguments for plt.imshow
are lined up like (n, m, RGB). Must be.
Since npimg is originally lined up with (RGB, vertical, horizontal), it is sorted in the order of the second argument of `` `np.transpose```.
dataiter = iter(trainloader)
print(type(trainloader))
# <class 'torch.utils.data.dataloader.DataLoader'>
print(type(dataiter))
# <class 'torch.utils.data.dataloader.DataLoaderIter'>
The __iter__
defined in DataLoader is called by iter () and returns DataLoaderIter
.
Unlike a normal iterator, you have to pass data by batch_size, so it seems that it defines a dedicated iterator. (Code)
As a result, every time you call dataiter.next ()
, the nth batch, n + 1th batch and repeated data will be acquired.
img = torchvision.utils.make_grid(images)
print(type(images)) # <class 'torch.FloatTensor'>
print(images.size) # torch.Size([4, 3, 32, 32])
print(type(img)) # <class 'torch.FloatTensor'>
print(img.size) # torch.Size([3, 36, 138])
The documentation is torchvision.utils.make_grid.
The make_grid
function arranges multiple images side by side.
The argument of make_grid
is a 4-dimensional tensor, while the return value is a 3-dimensional tensor. The argument tensor was a 4-dimensional tensor of [number of images, RGB, vertical, horizontal], but the dimension of the number of images is gone.
And as the documentation says, by default `` `padding = 2```, so 2 are added above and below to 36, and 2 is added between each image and at both ends, 32 * n + 2 * (n + 1) = 138 (n = 4) In other words, 32 is 138 for the horizontal.
Let's take a look at this code.
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Variable
wraps torch.Tensor so that it can hold gradient data etc. ..
Variable wraps a tensor, so one step is required when you want to see the data contained in Variable.
a = torch.FloatTensor([1.])
a2 = Variable(a)
print(type(a)) # <class 'torch.FloatTensor'>
print(type(a2)) # <class 'torch.autograd.variable.Variable'>
print(type(a2.data)) # <class 'torch.FloatTensor'>
print(a.numpy()) # [ 1.]
print(a2.data.numpy()) # [ 1.]
As shown above, the tensor is stored in .data
. (Reference)
By the way, the gradient information is
print(a2.grad) # None
It is saved here.
There is nothing right now, so None is included. It seems that only None
or Variable
is accepted.
a2.grad = Variable(torch.FloatTensor([100]))
print(a2.grad) # Variable containing: 100
I think it's a good idea to substitute it like this on the back side. And the fact that only Variable is accepted means that the gradient information will be stored there as well.
a2.grad.grad = Variable(torch.FloatTensor([200]))
print(a2.grad.grad) # Variable containing: 200
Is the higher derivative tampered with in this way? it's interesting.
class Net(nn.Module):
nn.Module The class is a base class, and forward etc. are defined in this class, and when creating a model, It inherits this. (Actually, something concrete is not defined in forward, it seems that it is assumed that it will be assigned after inheritance.)
The argument of nn.Conv2d is From the left, the number of input channels, the number of output channels, and the kernel size.
At the time of this model definition, the filter, which is a parameter of the convolution layer, is prepared by putting a random value near 0. It can be confirmed as follows.
conv1 = nn.Conv2d(3, 6, 5)
print(conv1.weight)
Parameter containing:
(0 ,0 ,.,.) =
-0.0011 -0.1120 0.0351 -0.0488 0.0323
-0.0529 -0.0126 0.1139 -0.0234 -0.0729
0.0384 -0.0263 -0.0903 0.1065 0.0702
0.0087 -0.0492 0.0519 0.0254 -0.0941
0.0351 -0.0556 -0.0279 -0.0641 -0.0790
(0 ,1 ,.,.) =
-0.0738 0.0853 0.0817 -0.1121 0.0463
-0.0266 0.0360 0.0215 -0.0997 -0.0559
0.0441 -0.0151 0.0309 -0.0026 0.0167
-0.0534 0.0699 -0.0295 -0.1043 -0.0614
-0.0820 -0.0549 -0.0654 -0.1144 0.0049
...
[torch.FloatTensor of size 6x3x5x5]
Since the kernel size is set to 5, the tensor (number of output channels, RGB, kernel size, kernel size) is prepared. By the way,
conv1 = nn.Conv2d(3, 6, (5,1))
print(conv1.weight)
Parameter containing:
(0 ,0 ,.,.) =
0.2339
-0.0756
0.0604
-0.0185
-0.0975
...
It seems that you can do other than square.
Also, this parameter is
type(conv1.weight)
# torch.nn.parameter.Parameter
It is defined by the Parameter object like this, and this is a subclass of Variable
, and it looks like a quick glance. , The function for display is defined, and if it is Variable, it will be Variable containing :, but if it is Parameter, it will be only different as Parameter containing :.
nn.MaxPool2d is the MAX pooling layer. The main arguments used are kernel_size, stride, padding.
So let's convolve and MAX pool the adaptive image to see how it's converted.
images, labels = dataiter.next()
print(images.size())
print(type(images))
image_plot = images[0][1].numpy()
plt.imshow(image_plot, cmap='Greys', interpolation='nearest')
plt.show()
#Model definition
img_input = Variable(images)
conv = nn.Conv2d(3, 1, 3, padding=1)
pool = nn.MaxPool2d(3, padding=1, stride=1)
#forward
conv_output = conv(img_input)
pool_output = pool(conv_output)
print(pool_output.size())
#plot
conv_plot = conv_output[0][0].data.numpy()
conv_plot
plt.imshow(conv_plot, cmap='Greys', interpolation='nearest')
plt.show()
pool_plot = pool_output[0][0].data.numpy()
plt.imshow(pool_plot, cmap='Greys', interpolation='nearest')
plt.show()
--Original image It's a horse.
--Image after convolution Here, since the convolution has not learned the parameters of the filter yet, it is a random parameter near 0, and all of them have the same value, so it seems that there is not much difference.
--Image after convolution & MAX pooling You can see that the MAX pooling has a 3 * 3 MAX and is blurred.
Next, let's take a look at this code.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
nn.CrossEntropyLoss () defines an object to be the objective function.
`` `torch.optim``` defines various optimization algorithms. SGD, Adam, etc.
Among the optimization algorithms defined in torch.optim
, we use optim.SGD here. I will.
We are passing net.parameters ()
as an argument to ʻoptim.SGD.
net.parameters () seems to return the parameters defined in the model (
torch.nn.parameter.Parameter`) as a generator.
type(net.parameters())
# generator
type(net.parameters().__next__())
# torch.nn.parameter.Parameter
print(net.parameters().__next__())
Parameter containing:
(0 ,0 ,.,.) =
-0.0998 0.0035 -0.0438 -0.1150 -0.0435
0.0310 -0.0750 -0.0405 -0.0745 -0.1095
-0.0355 0.0065 -0.0225 0.0729 -0.1114
0.0708 -0.0170 -0.0253 0.1060 0.0557
0.1057 0.0873 0.0793 -0.0309 -0.0861
...
If you check the objects held by optimizer,
optimizer.__dict__
{'param_groups': [{'dampening': 0,
'lr': 0.001,
'momentum': 0.9,
'nesterov': False,
'params': [Parameter containing:
(0 ,0 ,.,.) =
0.0380 -0.1152 0.0761 0.0964 -0.0555
-0.0325 -0.0455 -0.0755 0.0413 -0.0589
0.0116 0.1136 -0.0992 -0.1149 -0.0414
-0.0611 0.0827 -0.0906 0.0631 0.0170
0.0903 -0.0816 -0.0690 0.0470 -0.0578
...
net.parameters ()
keeps all the parameters included in the model, including my parameters.
Let's take a look at the [code] for training (http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#train-the-network).
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
#The second argument is the start position, which is 0, so enumerate(trainloader)Same as
# https://docs.python.org/3/library/functions.html#enumerate
# get the inputs
inputs, labels = data
# wrap them in Variable
inputs, labels = Variable(inputs), Variable(labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.data[0]
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
As we confirmed earlier, optimizer
holds all the parameters.
ʻOptimizer.zero_grad () ` initializes the grad of these held Variables. It seems. I think it's all None.
outputs = net(inputs)
print(type(outputs))
# <class 'torch.autograd.variable.Variable'>
print(outputs.size())
# torch.Size([4, 10])
outputs
Variable containing:
-2.4825 -4.4286 2.2041 3.4353 2.0734 2.8198 1.9374 0.7751 -2.6798 -3.1932
-1.7512 -4.6657 2.7911 3.9570 0.7931 5.9005 -0.8023 2.9664 -4.3328 -3.2921
2.4015 2.8962 0.9330 -1.2107 -0.0525 -2.2119 -1.2474 -2.6026 -0.1120 0.4869
-1.3042 -2.7538 1.0985 -0.2462 3.7435 1.1724 -1.4233 6.6892 -3.8201 -2.3132
[torch.FloatTensor of size 4x10]
By passing it to net
, you can see that the final output through the objective function is returned.
The comment says forward + backward + optimize, but I can't see the forward method.
This is actually because CrossEntropyLoss calls forward with call, which means
loss = criterion(outputs, labels)
loss = criterion.forward(outputs, labels)
The two are doing the same thing.
So loss = criterion (outputs, labels)
is forward.
loss
is a Variable object.
type(loss)
# torch.autograd.variable.Variable
Variable.backward () is inside this [torch.autograd.backward ()](http: / /pytorch.org/docs/master/autograd.html#torch.autograd.backward) is called.
Then, what this .backward ()
does is to find the derivative of the parameters contained in the objective function.
Let's try with a simple example.
x = torch.autograd.Variable(torch.Tensor([3,4]), requires_grad=True)
# requires_grad=True, tells you that this Variable will differentiate
print("x.grad : ", x.grad)
# None
#At this point, nothing is in it yet.
#Create an objective function as appropriate.
y = x[0]**2 + 5*x[1] + x[0]*x[1]
# x[0]Derivative of: 2*x[0] + x[1]
# x[0]Derivative coefficient of: 2*3 + 4 = 10
# x[1]Derivative of: 5 + x[0]
# x[1]Derivative coefficient of: 5 + 3 = 8
y.backward()
# torch.autograd.backward(y)But it's okay.
print("x.grad : ", x.grad)
# 10
# 8
# .zero_grad()instead of
x.grad = None
It is the differential coefficient of the objective function y
at the input data point.
It should be noted here that backward is about the loss function, so backward y must be a scalar.
For example
y = x
y.backward()
# RuntimeError: grad can be implicitly created only for scalar outputs
Then, I get angry to make it a scalar.
.step () updates the parameters based on the gradient calculated by .backward ()
Will do it.
Let's check it out.
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(net.parameters().__next__())
Parameter containing:
(0 ,0 ,.,.) =
-0.0839 0.1434 -0.0371 -0.1394 -0.0277
After running it several times (although it is optimized with the same data)
print(net.parameters().__next__())
Parameter containing:
(0 ,0 ,.,.) =
-0.0834 0.1436 -0.0371 -0.1389 -0.0276
And so, the parameters are updated little by little.
Predict the model for the test data. This code.
correct = 0
total = 0
for data in testloader:
images, labels = data
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
images, labels = data
#print("images type : ", type(images))
#print("images.shape : ", images.shape)
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i]
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
# Accuracy of plane : 51 %
# Accuracy of car : 54 %
# Accuracy of bird : 53 %
# Accuracy of cat : 33 %
# Accuracy of deer : 41 %
# Accuracy of dog : 50 %
# Accuracy of frog : 54 %
# Accuracy of horse : 65 %
# Accuracy of ship : 70 %
# Accuracy of truck : 67 %
Is this the only one I'm not familiar with? torch.squeeze In the dimension of the tensor, erase the one of 1. It seems that squeeze means to squeeze.
You can find out what kind of model it is by looking at ~~ net.parameters
. ~~ (Corrected on October 27, 2017)
Since the __repr__
of nn.Module is defined so that the model is displayed in an easy-to-read manner, you can roughly understand what kind of model it is as follows.
In [22]: net
Out[22]:
Net (
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear (400 -> 120)
(fc2): Linear (120 -> 84)
(fc3): Linear (84 -> 10)
)
--The method of abstracting pytorch is quite helpful. --It seems easy to use.
Recommended Posts