[PYTHON] Build GPU environment with GCP and kaggle official image (docker)

Introduction

Since it was necessary to have a GPU-equipped machine power to participate in the kaggle image competition, we built a GPU environment with GCP. There are already a number of articles that are very helpful, and I built it with reference to them, but there were many errors due to differences in versions, so I will summarize it again here. I hope it helps those who are new to GCP. We would also like to thank those who are familiar with the matter for pointing out any mistakes or improvements.

Register with GCP

  1. First of all, please register with GCP. I think there is no problem because I just follow the guidance.
  1. Activate the paid usage. I think there is an "ACTIVATE" button in the upper right corner of the window, so click on it. ――In addition, just because you activate it does not mean that you will be charged immediately. I think you got a $ 300 free credit (like a coupon) during the registration process, but only after that you will be billed for the actual cost. This $ 300 is just about a month of using one instance of GPU.

Launch an instance

  1. First, apply for GPU usage. Click the menu icon like "Three" in the upper left → "IAM and Management" → "Assignment" → "Filter Table" → "Upper Limit Name" → "GPUs (all regions)" → "Compute Engine API" "Allocation amount"-> "Edit allocation" at the top-> Select with the check box-> Enter 1 (or any required number) as the new upper limit on the screen on the right-> "in order to run deep learning model" in the request description It should be okay if you include "for kaggle". --This request may take hours to days to approve. Please reload this page once in a while to see if the limits have been updated.

  2. Next, deploy the instance. Click the menu icon such as "Three" in the upper left → "Marketplace" → Search for "Deep Learning VM" → "Start operation" → Set as follows. → After completing the settings, "Deploy" --Deployment name: Set any environment name. --Zone: You can leave it as it is. --Machine Type: Set the number of CPUs and memory. Kaggle's kernel notebook has 4 CPUs and 16 GB RAM as an indicator, although the number required depends on the analysis. --GPUs: I think it's okay if you choose NVIDIA Tesla K80. --Framework: It's okay as it is. --GPU, Access to Jupyter Lab: Please check both. --Boot disk: You can leave the disk type as it is, but change the setting if necessary. Increase the Boot disk size. I think you need at least 300-500. --Networking: You can leave it as it is. -** Price **: The price expected from the current setting status is displayed in the upper right corner of the page. Please consult with your budget and make minor adjustments to the above settings.

  3. Start the instance. Click the menu icon like "Three" in the upper left → "Compute Engine" → "VM instance" → Click the "..." mark (vertical) at the right end of the instance you just deployed → Select "Start". --You should now be able to start the instance, so you will be charged. If you do not need it, select "Stop" from the same operation.

Environment

  1. First, launch a black screen (prompt). From the same page as above (click the menu icon like "three" in the upper left → "Compute Engine" → "VM instance"), click "▼" in the second "connection" on the far right → "Open in browser window" --At this time, an error may be displayed on the started screen. In that case, close the started screen and open the black screen again in the same process. ――In addition to this, you can also connect with gcloud. If you are interested, please check it out.

  2. Build the official Kaggle docker image. Type the following command on a black screen. (Excluding the # line)

#Official image
# https://github.com/Kaggle/docker-python
git clone https://github.com/Kaggle/docker-python.git
cd docker-python

#GPU version build
#I think it will take about 30 minutes
./build --gpu

#Check the image
# kaggle/python-gpu-If build is displayed, it is successful
docker images

  1. Create a container. Type the following command on a black screen. (Excluding the # line)
docker run -itd  --runtime=nvidia  -p 8888:8888   -v /home:/home   --name cont-1   -h host   kaggle/python-gpu-build /bin/bash

Please note that the following are subject to change.

--cont-1: The container name is arbitrary. This time I did it like this. The same applies to cont-1 that appears after this. --kaggle / python-gpu-build: This is the image name displayed in the previous step "docker images". If you follow the steps, this example should be fine.

  1. Enter the container. Do the following:
#Start the container
$ docker container start cont-1

#Enter the container
$ docker container attach cont-1

Enter exit to exit the container.

  1. Confirm the operation of GPU. While in the container, do the following:
#Path setting
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

#Operation check
nvidia-smi

If you can see the GPU details, the container knows the GPU. If you get an error, please troubleshoot as described below.

  1. Check the operation of GPU from python. With it in the container, first type `` `python``` to launch the Python client. When it starts up, do the following:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Again, if the GPU is displayed without error, the GPU settings are perfect. If there are any errors, please troubleshoot.

Troubleshooting GPU usage

  1. Check if the GPU can be used from the instance itself in the first place.
nvidia-smi

If you don't see the GPU, it's suspected that you can't select the GPU in your instance settings. GCP "VM Instance"-> Click Instance-> Search for GPU with Ctrl + F on the instance details screen. If not, start over from the instance launch section.

  1. Try another method to recognize the GPU and see if it can be started. If running the following command doesn't work, you may need to update the version of cudo.
docker run --runtime nvidia -itd -p 8887:8887 -v /home:/home --name gpu-test -h host

For more information, please see Comments here.

Launch Jupyter notebook

  1. First, start Jupyter notebook. Do the following while in the container:
jupyter notebook --port 8888 --ip=0.0.0.0 --allow-root
  1. Next, make settings to access the started notebook. [This article](jupyter notebook --port 8888 --ip = 0.0.0.0 --allow-root) is very detailed, so I will leave the explanation.

In addition, it may not be possible to connect with https, so in that case, please try with http as well.

Introduction of kaggle API

This setting is required because kaggle datasets are downloaded and uploaded through the API.

  1. Install the kaggle package with pip.
pip install kaggle
  1. Get the kaggle API key. Click kaggle HP → My account → API → Create New API Token. The json file will be downloaded.

  2. Save the API key in your GCP instance. The procedure is to first create a directory on the instance, save the key there, and grant execute permission.

#Creating a directory
mkdir ~/.kaggle

#File creation
touch ~/.kaggle/kaggle.json

#Write to file
#It's easy to copy and paste the key downloaded from kaggle
vim  ~/.kaggle/kaggle.json

#Grant execution authority
chmod 777 ~/.kaggle/kaggle.json

  1. Operate using the kaggle API. Please see the Official Document for details.

Other things you need to know

I think that the development environment is basically set up with the above, but I think that different errors will occur depending on each experiment in the future. Here are some things that you should know at that time to make it easier for you to solve yourself and to google.

--File operation --It's easier to remember Linux commands for something like creating, deleting, or moving directories and files. -Here is a concise summary of important commands. --Check occupied capacity ――It often happens that an error occurs due to insufficient capacity. It's a good idea to remember the df and du commands. -Here is summarized briefly.

reference

-Building a calculation environment for Kaggle with GCP and Docker -Procedure for building kaggle environment with GCP and docker -[Install Jupyter Notebook on Compute Engine instance using Anaconda](https://qiita.com/tk_01/items/307716a680460f8dbe17#vm%E3%82%A4%E3%83%B3%E3% 82% B9% E3% 82% BF% E3% 83% B3% E3% 82% B9% E3% 81% AE% E3% 83% 95% E3% 82% A1% E3% 82% A4% E3% 82% A2% E3% 83% BC% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 83% AB% E3% 81% AE% E8% A8% AD% E5% AE% 9A)

Recommended Posts

Build GPU environment with GCP and kaggle official image (docker)
Build PyPy and Python execution environment with Docker
Build Mysql + Python environment with docker
Build a Python execution environment using GPU with GCP Compute engine
[DynamoDB] [Docker] Build a development environment for DynamoDB and Django with docker-compose
Build Jupyter Lab (Python) environment with Docker
[Linux] Build a jenkins environment with Docker
Build NGINX + NGINX Unit + MySQL environment with Docker
[Linux] Build a Docker environment with Amazon Linux 2
Build a CentOS Linux 8 environment with Docker and start Apache HTTP Server
Build Django + NGINX + PostgreSQL development environment with Docker
Go (Echo) Go Modules × Build development environment with Docker
[Python] Build a Django development environment with Docker
Build a virtual environment with pyenv and venv
Build a Python + bottle + MySQL environment with Docker on RaspberryPi3! [Trial and error]
Environment construction: GCP + Docker
Build a python virtual environment with virtualenv and virtualenvwrapper
Deep learning image analysis starting with Kaggle and Keras
Image capture / OpenCV speed comparison with and without GPU
Build a python virtual environment with virtualenv and virtualenvwrapper
Build a development environment with Poetry Django Docker Pycharm
Build a numerical calculation environment with pyenv and miniconda3
Build a Django development environment with Docker! (Docker-compose / Django / postgreSQL / nginx)
Build a Docker environment that can use PyTorch and JupyterLab
Build a machine learning scikit-learn environment with VirtualBox and Ubuntu
[Memo] Build a development environment for Django + Nuxt.js with Docker
Easily build a GCP environment for Kaggle at high speed
(Now) Build a GPU Deep Learning environment with GeForce GTX 960
Build python3 environment with ubuntu 16.04
Implement PyTorch + GPU with Docker
Build python environment with direnv
[Django] Build a Django container (Docker) development environment quickly with PyCharm
Build Docker environment (Linux 8) and start Apache HTTP Server container
Create a Todo app with Django ① Build an environment with Docker
Build a Python environment on your Mac with Anaconda and PyCharm
Build and try an OpenCV & Python environment in minutes using Docker
Create an arbitrary machine learning environment with GCP + Docker + Jupyter Lab
Until you build the environment with ABCI and run MaskTrack RCNN
Create a C ++ and Python execution environment with WSL2 + Docker + VSCode
Create a simple Python development environment with VS Code and Docker
How to build Python and Jupyter execution environment with VS Code
Easy Slackbot with Docker and Errbot
Build python virtual environment with virtualenv
Build a go environment using Docker
Image segmentation with scikit-image and scikit-learn
Build a deb file with Docker
Build Flask environment with Dockerfile + docker-compose.yml
Build IPython Notebook environment with boot2docker
Image recognition environment construction and basics
Rebuild Django's development environment with Docker! !! !! !!
Data science environment construction with Docker
Environment construction with pyenv and pyenv-virtualenv
Build a development environment using Jupyter and Flask with Python in Docker (supports both VS Code / code-server)
Build a Python + bottle + MySQL environment with Docker on RaspberryPi3! [Easy construction]
Realize environment construction for "Deep Learning from scratch" with docker and Vagrant
I set the environment variable with Docker and displayed it in Python
Build a drone simulator environment and try a simple flight with Mission Planner
Easily build a development environment with Laragon
Build Python3 and OpenCV environment on Ubuntu 18.04
Let's try gRPC with Go and Docker
Build a Fast API environment with docker-compose