Try building a GPU container on GCP.

Introduction

I wanted to try creating a GPU container, but I didn't have the option to buy a GPU for that, so I created a GPU instance on the cloud and built a GPU container there. Since it took time other than that, I will summarize the procedure and commands as a memo for myself.

Advance preparation

-Create a Google Cloud account and put it on the screen of the GCP console. -Upgrade. (If you are in a free trial, you cannot apply for GPU allocation)

environment

[Host OS] ・ Ubuntu 20.04 LTS (on GCP) ・ Nvidia-driver 460. ・ Docker 19.03.14 ・ Nvidia-driver2 2.5.0

GPU container creation procedure

1. Create GPU instance
1. Host settings
1. Creating a GPU container

1. 1. Create GPU instance

Apply for GPU allocation

When creating a VM instance with a GPU, you need to apply for quota only for the first time. First, select IAM and Management> Assignment from the GCP console. , It is necessary to apply for two, [Number of allocations in total] and [Number of allocations for each GPU region].

** [Number of allocations in total] ** Enter GPUs in the filter and select the service shown below.

Check globally and click "Edit Assignment". Enter the upper limit and reason and proceed to the next.

On the next screen, enter your [Name], [Email Address], and [Telephone Number] to send the request. Approval will be given in about 5 minutes, so next create a GPU instance.

** [Number of allocations for each GPU region] **

Filter appropriately and choose NVIDIA T4 this time.

Refer to here for details on usage fees ⇒ https://cloud.google.com/compute/gpus-pricing?hl=ja

Since the allocation is in region units, select the region where you plan to set up a GPU instance, and then specify the upper limit of the number of GPUs. Since the numerical value specified here is the upper limit, no charge will be incurred at this point. This time, one for the purpose of building a GPU container.

Create GPU instance

Select Compute Engine> VM Instance from the GCP console. Click [Create] and create with the following specifications.

● Region: us-central1 (Iowa) ● Zone: us-central1-a

● Machine configuration Machine family: GPU Series: N1 Machine type: n1-standard-1 (1vCPU, 3.75GB memory)

If you select a region other than the one you applied for as a trial, the GPU tag will not appear. 　 ● GPU type GPU type: NVIDIA Tesla T4 Number of GPUs: 1 　 ● Boot disk 　　　OS　　　　　　：Ubuntu Version: Ubuntu 20.04 LTS Disc type: Standard persistent disc Size: 20GB

● Firewall Allow HTTP traffic Allow HTTPS traffic

It took a few minutes to start. .. ..

** Check server environment **

`OS check`


$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"

`Confirmation of GPU (made by NVIDIA)`


$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4](rev a1)
        Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
        Kernel modules: nvidiafb

2. 2. Host settings

** Disable nouveau graphics driver ** It seems that it is necessary to disable the nouveau graphics driver that is included in linux by default.

`python`


$ lsmod | grep -i nouveau

Since the VM environment this time was not included, nothing was returned.

** Package management tool update **

`python`


$ sudo apt update
$ sudo apt upgrade

Nvidia driver installation

You can check the Driver version of NVIDIA from this site. https://www.nvidia.co.jp/Download/index.aspx?lang=jp

** Add repository and update **

`python`


$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update

** Check the recommended installation driver **

`python`


$ sudo apt ubuntu-drivers-common
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:04.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor   : NVIDIA Corporation
model    : TU104GL [Tesla T4]
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-440-server - distro non-free
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

** Driver installation **

`python`


$ sudo apt install nvidia-driver-460

** Reboot and check if the installation was successful **

`Confirmation of GPU (made by NVIDIA)`


$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4](rev a1)
        Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nvidia_drm, nvidia

OK if driver information is added.

`nvidia-Checking the operation of the smi command`


$ nvidia-smi

install docker

[Systemctl cannot be used on Ubuntu in Docker container] Refer to step 1.

The docker version is docker-ce = 5: 19.03.14 ~ 3-0 ~ ubuntu-focal

Install nvidia-docker2

Install the packages required to create and launch a GPU container.

** Add repository and update ** 　https://nvidia.github.io/nvidia-docker

`python`


#GPG key registration
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

#Add repository
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
 
$ sudo apt update

** Install nvidia-docker2 **

`Check the installable package version`


$ apt-cache madison nvidia-docker2

`python`


#Install the latest version (execute this time)
$ sudo apt -y install nvidia-docker2

#When installing by specifying the version, it looks like this
$ sudo apt -y install nvidia-docker2=2.0.3+docker18.09.7-3

#Older versions may require you to install other packages in advance. (Reference below)
$ sudo apt install nvidia-container-runtime-hook
$ sudo apt install nvidia-container-runtime=2.0.0+docker18.09.7-3

Currently, when nvidia-docker2 is installed, nvidia-container-toolkit also has a dependency and is installed together. (Nvidia-container-toolkit is a newer package.) If you install only the nvidia-container-toolkit package, you can't use the --runtime = nvidia or nvidia-docker commands because Docker has native GPU support.

** Reload docker daemon settings **

`python`


$ sudo pkill -SIGHUP dockerd

`python`


$ sudo nvidia-docker version
NVIDIA Docker: 2.5.0
Client: Docker Engine - Community
 Version:           20.10.2
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:17:43 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
・ ・ ・ ・
・ ・ ・ ・

3. 3. Creating a GPU container

Get image of GPU container

Click here for details https://hub.docker.com/r/nvidia/cuda/

`python`


$ docker image pull nvidia/cuda:11.1-base-ubuntu20.04

Confirmation of GPU container startup

`Latest command (docker19.Supported since 03`--gpus`When using the option)`


#When using all GPUs
$ docker container run --gpus all --rm nvidia/cuda nvidia-smi

#If you want to specify the GPU to use
$ docker container run --gpus '"device=0,1"'--rm nvidia/cuda nvidia-smi

Probably because the version of nvidia-docker2 is too new, nvidia-smi could not be executed because the options (environment variables?) Such as devices were not set enough.

`Old command (docker19.Before 03+nvidia-If the docker2 package is included)`


$ docker container run --runtime=nvidia --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi
# or
$ nvidia-docker run --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi

#If you want to specify the GPU to use
$ docker container run --runtime=nvidia NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi

Try building a GPU container on GCP.

Introduction

Advance preparation

environment

GPU container creation procedure

1. 1. Create GPU instance

Apply for GPU allocation

Create GPU instance

OS check

Confirmation of GPU (made by NVIDIA)

2. 2. Host settings

python

python

Nvidia driver installation

python

python

python

Confirmation of GPU (made by NVIDIA)

nvidia-Checking the operation of the smi command

install docker

Install nvidia-docker2

python

Check the installable package version

python

python

python

3. 3. Creating a GPU container

Get image of GPU container

python

Confirmation of GPU container startup

Latest command (docker19.Supported since 03--gpusWhen using the option)

Old command (docker19.Before 03+nvidia-If the docker2 package is included)

`OS check`

`Confirmation of GPU (made by NVIDIA)`

`python`

`python`

`python`

`python`

`python`

`Confirmation of GPU (made by NVIDIA)`

`nvidia-Checking the operation of the smi command`

`python`

`Check the installable package version`

`python`

`python`

`python`

`python`

`Latest command (docker19.Supported since 03`--gpus`When using the option)`

`Old command (docker19.Before 03+nvidia-If the docker2 package is included)`