Try building a GPU container on GCP.

Introduction

I wanted to try creating a GPU container, but I didn't have the option to buy a GPU for that, so I created a GPU instance on the cloud and built a GPU container there. Since it took time other than that, I will summarize the procedure and commands as a memo for myself.

Advance preparation

-Create a Google Cloud account and put it on the screen of the GCP console. -Upgrade. (If you are in a free trial, you cannot apply for GPU allocation)

environment

[Host OS] ・ Ubuntu 20.04 LTS (on GCP) ・ Nvidia-driver 460. ・ Docker 19.03.14 ・ Nvidia-driver2 2.5.0

GPU container creation procedure

    1. Create GPU instance
    1. Host settings
    1. Creating a GPU container

1. 1. Create GPU instance

Apply for GPU allocation

When creating a VM instance with a GPU, you need to apply for quota only for the first time. First, select IAM and Management> Assignment from the GCP console. , It is necessary to apply for two, [Number of allocations in total] and [Number of allocations for each GPU region].

** [Number of allocations in total] ** Enter GPUs in the filter and select the service shown below.

Qiita-no024_img04.jpg

Check globally and click "Edit Assignment". Enter the upper limit and reason and proceed to the next.

Qiita-no024_img05.jpg

On the next screen, enter your [Name], [Email Address], and [Telephone Number] to send the request. Approval will be given in about 5 minutes, so next create a GPU instance.

** [Number of allocations for each GPU region] **

Filter appropriately and choose NVIDIA T4 this time.

Qiita-no024_img01.jpg

Qiita-no024_img02.jpg

Since the allocation is in region units, select the region where you plan to set up a GPU instance, and then specify the upper limit of the number of GPUs. Since the numerical value specified here is the upper limit, no charge will be incurred at this point. This time, one for the purpose of building a GPU container.

Qiita-no024_img03.jpg

Create GPU instance

Select Compute Engine> VM Instance from the GCP console. Click [Create] and create with the following specifications.

● Region: us-central1 (Iowa) ● Zone: us-central1-a

● Machine configuration Machine family: GPU Series: N1 Machine type: n1-standard-1 (1vCPU, 3.75GB memory)

  • If you select a region other than the one you applied for as a trial, the GPU tag will not appear.   ● GPU type GPU type: NVIDIA Tesla T4 Number of GPUs: 1   ● Boot disk    OS      :Ubuntu Version: Ubuntu 20.04 LTS Disc type: Standard persistent disc Size: 20GB

● Firewall Allow HTTP traffic Allow HTTPS traffic

It took a few minutes to start. .. ..

** Check server environment **

OS check


$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"

Confirmation of GPU (made by NVIDIA)


$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4](rev a1)
        Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
        Kernel modules: nvidiafb

2. 2. Host settings

** Disable nouveau graphics driver ** It seems that it is necessary to disable the nouveau graphics driver that is included in linux by default.

python


$ lsmod | grep -i nouveau
  • Since the VM environment this time was not included, nothing was returned.

** Package management tool update **

python


$ sudo apt update
$ sudo apt upgrade

Nvidia driver installation

You can check the Driver version of NVIDIA from this site. https://www.nvidia.co.jp/Download/index.aspx?lang=jp

** Add repository and update **

python


$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update

** Check the recommended installation driver **

python


$ sudo apt ubuntu-drivers-common
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:04.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor   : NVIDIA Corporation
model    : TU104GL [Tesla T4]
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-440-server - distro non-free
driver   : nvidia-driver-450 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

** Driver installation **

python


$ sudo apt install nvidia-driver-460

** Reboot and check if the installation was successful **

Confirmation of GPU (made by NVIDIA)


$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4](rev a1)
        Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nvidia_drm, nvidia
  • OK if driver information is added.

nvidia-Checking the operation of the smi command


$ nvidia-smi

install docker

[Systemctl cannot be used on Ubuntu in Docker container] Refer to step 1.

  • The docker version is docker-ce = 5: 19.03.14 ~ 3-0 ~ ubuntu-focal

Install nvidia-docker2

Install the packages required to create and launch a GPU container.

** Add repository and update **  https://nvidia.github.io/nvidia-docker

python


#GPG key registration
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

#Add repository
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
 
$ sudo apt update

** Install nvidia-docker2 **

Check the installable package version


$ apt-cache madison nvidia-docker2

python


#Install the latest version (execute this time)
$ sudo apt -y install nvidia-docker2

#When installing by specifying the version, it looks like this
$ sudo apt -y install nvidia-docker2=2.0.3+docker18.09.7-3

#Older versions may require you to install other packages in advance. (Reference below)
$ sudo apt install nvidia-container-runtime-hook
$ sudo apt install nvidia-container-runtime=2.0.0+docker18.09.7-3
  • Currently, when nvidia-docker2 is installed, nvidia-container-toolkit also has a dependency and is installed together. (Nvidia-container-toolkit is a newer package.) If you install only the nvidia-container-toolkit package, you can't use the --runtime = nvidia or nvidia-docker commands because Docker has native GPU support.

** Reload docker daemon settings **

python


$ sudo pkill -SIGHUP dockerd

python


$ sudo nvidia-docker version
NVIDIA Docker: 2.5.0
Client: Docker Engine - Community
 Version:           20.10.2
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:17:43 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
・ ・ ・ ・
・ ・ ・ ・

3. 3. Creating a GPU container

Get image of GPU container

Click here for details https://hub.docker.com/r/nvidia/cuda/

python


$ docker image pull nvidia/cuda:11.1-base-ubuntu20.04

Confirmation of GPU container startup

Latest command (docker19.Supported since 03--gpusWhen using the option)


#When using all GPUs
$ docker container run --gpus all --rm nvidia/cuda nvidia-smi

#If you want to specify the GPU to use
$ docker container run --gpus '"device=0,1"'--rm nvidia/cuda nvidia-smi
  • Probably because the version of nvidia-docker2 is too new, nvidia-smi could not be executed because the options (environment variables?) Such as devices were not set enough.

Old command (docker19.Before 03+nvidia-If the docker2 package is included)


$ docker container run --runtime=nvidia --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi
# or
$ nvidia-docker run --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi

#If you want to specify the GPU to use
$ docker container run --runtime=nvidia NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi

Recommended Posts

Try building a GPU container on GCP.
Try Hello World using plain Java on a Docker container
Run React on a Docker container
Run PureScript on a Docker container
[Ruby] Building a Ruby development environment on Ubuntu
How to deploy a container on AWS Lambda
Create a Lambda Container Image based on Java 15
Building a Hadoop cluster (Cloudera Manager on Ubuntu 18.04)
Building a Ruby environment for classes on Mac
I tried running Ansible on a Docker container
[Introduction] Try to create a Ruby on Rails application
Let's create a gcloud development environment on a centos8 container
A memorandum when installing Docker and building a Linux container
Try building Java into a native module with GraalVM
Try DisplayLink on Ubuntu 20.04
Lambda on Terraform Container
Try OpenLiteSpeed on CentOS8
Try actions on GitHub [actions]
Try sending a notification.
Difficulties in building a Ruby on Rails environment (Windows 10) (SQLite3)
How to get inside a container running on AWS Fargate
Try launching a webAP server on the micro using Helidon
I tried running a Docker container on AWS IoT Greengrass 2.0