[PYTHON] [PyTorch] TRANSFER LEARNING FOR COMPUTER VISION

Introduction

TRANSFER LEARNING FOR COMPUTER VISION STRUCT What is this with the code in (1)? I wrote an article because I wanted to summarize what I thought. If you make a mistake, I would appreciate it if you could comment.

optimizer.zero_grad()

It is a line quietly in train_model function, but it is a fairly important function to initialize the accumulation of gradients. .. If the gradient is accumulated without initialization, it will not converge. When updating the weight $ W $, it is based on the steepest descent method

W = W - \eta \frac{\partial L}{\partial W}

Of this formula

\frac{\partial L}{\partial W}

This part is the gradient. $ \ eta $ is the learning rate. Therefore, ```optimizer.zero_grad ()` `` is required when learning.

set_grad_enabled() train_model function is a function called as a with clause. As a result of investigating that there is no problem with calculation without it, [Create calculation graph](https://discuss.pytorch.org/t/why-we-need-torch-set-grad-enabled-false-here / 41240) In (2), forward propagation and back propagation are required during learning, but back propagation is not used during evaluation, so it is thought that the purpose is to reduce the amount of calculation. Since you are using the with clause, is it related to memory allocation?

running_loss += loss.item() * inputs.size(0) The first time you see it in the train_model function? It is a line that becomes. In the definition of the loss function in the first place

criterion = nn.CrossEntropyLoss()
loss = criterion(outputs, labels)

The cause is that, but if you look at CrossEntropyLoss, the argument is `` `reduction ='mean'``` There is. In other words, since the average loss value is returned by default, there is no problem with batch learning, but with mini-batch learning, it is necessary to change from the average value (mean) to the sum (sum). Therefore,

running_loss += loss.item() * inputs.size(0)

The average loss value of a small and difficult mini-batch is multiplied by the number of samples of the mini-batch to return it to the original value. By the way, even if you don't do this

criterion = nn.CrossEntropyLoss(reduction='sum')

Returns the same result as `loss.item () * inputs.size (0)`. So

#running_loss += loss.item() * inputs.size(0)
running_loss += loss.item()

You can write it obediently without doing anything strange. For the time being, it is also mentioned in here (3).

Postscript (2020/3/11)

`criterion = nn.CrossEntropyLoss(reduction='sum')After finding the loss value with, backpropagation results in nan.[bug](https://github.com/pytorch/pytorch/issues/17350)(4)There seems to be.`


 So be gentle ``` running_loss + = loss.item () * inputs.size (0)` `` This may be better to use

## at the end
 I will continue writing when there are more places to worry about.


## Reference material
(1) [TRANSFER LEARNING FOR COMPUTER VISION TUTORIAL](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html?highlight=transfer%20learning)
(2) [Why we need torch.set_grad_enabled(False) here?](https://discuss.pytorch.org/t/why-we-need-torch-set-grad-enabled-false-here/41240)
(3) [Issue about updating training loss #2](https://github.com/udacity/deep-learning-v2-pytorch/issues/2)
(4) [torch.nn.CrossEntropyLoss with "reduction" sum/mean is not deterministic on segmentation outputs / labels #17350](https://github.com/pytorch/pytorch/issues/17350)