[PYTHON] Implement PyTorch + GPU with Docker

Introduction

Recently I finally started using Docker. With Docker, you can easily do deep learning on various PCs.

Environment (host)

OS:Ubuntu 20.04 GPU:NVIDIA GeForce GTX 1080

Install GPU driver

First, create an environment where the host can use the GPU. If you already have the driver installed with $ nvidia-smi, this is fine. This is an example of installation, so for reference

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ sudo apt install ubuntu-drivers-common
$ sudo apt dist-upgrade
$sudo reboot
$ sudo ubuntu-drivers autoinstall
$sudo reboot

If $ nvidia-smi shows the driver version and memory usage, it's OK!

Install Docker

This runs the official homepage (https://docs.docker.com/engine/install/ubuntu/) as is

$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

Confirmed operation with $ sudo docker run hello-world

Install Nvidia Container Toolkit

Necessary (probably) to use CUDA with Docker https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html# https://github.com/NVIDIA/nvidia-docker/issues/1186

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

Dockerfile The Dockerfile describes what the virtual environment should look like. You can change the base environment in the FROM part of the first line. (Ubuntu and CUDA versions, presence of cudnn, etc.) If you check DockerHub of nvidia / cuda, you can find various things. (https://hub.docker.com/r/nvidia/cuda/tags) You can also select the Python library on the third line of RUN.

Dockerfile



FROM nvidia/cuda:11.0-devel-ubuntu20.04

RUN apt-get update
RUN apt-get install -y python3 python3-pip
RUN pip3 install torch torchvision

WORKDIR /work

COPY train.py /work/

ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs

Deep learning script

Implement train.py in the same directory as the Dockerfile you just created. train.py trains with data called MNIST, which can be said to be the Hello World! Of deep learning. (Quote: https://github.com/pytorch/examples/blob/master/mnist/main.py)

train.py


from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            if args.dry_run:
                break


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


def main():
    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size', type=int, default=64, metavar='N',
                        help='input batch size for training (default: 64)')
    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=14, metavar='N',
                        help='number of epochs to train (default: 14)')
    parser.add_argument('--lr', type=float, default=1.0, metavar='LR',
                        help='learning rate (default: 1.0)')
    parser.add_argument('--gamma', type=float, default=0.7, metavar='M',
                        help='Learning rate step gamma (default: 0.7)')
    parser.add_argument('--no-cuda', action='store_true', default=False,
                        help='disables CUDA training')
    parser.add_argument('--dry-run', action='store_true', default=False,
                        help='quickly check a single pass')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                        help='how many batches to wait before logging training status')
    parser.add_argument('--save-model', action='store_true', default=False,
                        help='For Saving the current Model')
    args = parser.parse_args()
    use_cuda = not args.no_cuda and torch.cuda.is_available()

    torch.manual_seed(args.seed)

    device = torch.device("cuda" if use_cuda else "cpu")

    kwargs = {'batch_size': args.batch_size}
    if use_cuda:
        kwargs.update({'num_workers': 1,
                       'pin_memory': True,
                       'shuffle': True},
                     )

    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])
    dataset1 = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
    dataset2 = datasets.MNIST('../data', train=False,
                       transform=transform)
    train_loader = torch.utils.data.DataLoader(dataset1,**kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)

    model = Net().to(device)
    optimizer = optim.Adadelta(model.parameters(), lr=args.lr)

    scheduler = StepLR(optimizer, step_size=1, gamma=args.gamma)
    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(model, device, test_loader)
        scheduler.step()

    if args.save_model:
        torch.save(model.state_dict(), "mnist_cnn.pt")


if __name__ == '__main__':
    main()

Execute it for the time being and check the operation

Build a Dockerfile to create and run a virtual environment. You can see if the GPU is being used with $ nvidia-smi while running train.py.

$ sudo docker build -t [Container name] .
$ sudo docker run -it --gpus all [Container name] /bin/bash
----In the container below-----
$ python3 train.py

Finally

This time I created a virtual environment with PyTorch, but by changing the contents of the Dockerfile, I think you can use other deep learning libraries as well. Also, if the training data is huge, you can mount the training data in the virtual environment with docker commands. Even so, Docker is convenient (laughs)

Recommended Posts

Implement PyTorch + GPU with Docker
Play with PyTorch
I tried to implement CVAE with PyTorch
Cross-validation with PyTorch
Beginning with PyTorch
nvidia-docker2 installation guide for using gpu with docker
Tftp server with Docker
Use RTX 3090 with PyTorch
Implement FReLU with tf.keras
Use python with docker
Proxy server with Docker
Hello, World with Docker
Install torch-scatter with PyTorch 1.7
I tried to implement and learn DCGAN with PyTorch
Preparing the execution environment of PyTorch with Docker November 2019
I tried to implement SSD with PyTorch now (Dataset)
Implement login function with django-allauth
Implement subcommands with Python's argparse
Try an autoencoder with Pytorch
Implement Style Transfer in Pytorch
Try implementing XOR with PyTorch
PySpark life starting with Docker
Prepare python3 environment with Docker
Prediction of Nikkei 225 with Pytorch 2
Machine learning Minesweeper with PyTorch
AWS Lambda with PyTorch [Lambda import]
[QtDesigner] Implement WebView with PyQt5
Build GPU environment with GCP and kaggle official image (docker)
Prediction of Nikkei 225 with Pytorch
Try Selenium Grid with Docker
Perform Stratified Split with PyTorch
I made Word2Vec with Pytorch
Implement blockchain with about 60 lines
Try building JupyterHub with Docker
Rails application building with Docker
Try to implement linear regression using Pytorch with Google Colaboratory
I tried to implement SSD with PyTorch now (model edition)
Machine Learning with docker (42) Programming PyTorch for Deep Learning By Ian Pointer
I tried to implement sentence classification by Self Attention with PyTorch
Japaneseize Matplotlib with Alpine using Docker
Until you start Jupyter with Docker
Easy Slackbot with Docker and Errbot
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 2)
Learn with PyTorch Graph Convolutional Networks
Creating a Flask server with Docker
Build a deb file with Docker
Tips for running Go with docker
Build Mysql + Python environment with docker
Deploy a Django application with Docker
Google App Engine development with Docker
I implemented Attention Seq2Seq with PyTorch
I tried implementing DeepPose with PyTorch
Build PyPy execution environment with Docker
Implement Keras LSTM feedforward with numpy
Prediction of Nikkei 225 with Pytorch ~ Intermission ~
How to Data Augmentation with PyTorch
Service mesh learned with Docker Swarm
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 1)
pytorch @ python3.8 environment construction with pipenv
Rebuild Django's development environment with Docker! !! !! !!
Data science environment construction with Docker