Introduction

Since it was necessary to have a GPU-equipped machine power to participate in the kaggle image competition, we built a GPU environment with GCP. There are already a number of articles that are very helpful, and I built it with reference to them, but there were many errors due to differences in versions, so I will summarize it again here. I hope it helps those who are new to GCP. We would also like to thank those who are familiar with the matter for pointing out any mistakes or improvements.

Register with GCP

First of all, please register with GCP. I think there is no problem because I just follow the guidance.

https://console.cloud.google.com/

Activate the paid usage. I think there is an "ACTIVATE" button in the upper right corner of the window, so click on it. ――In addition, just because you activate it does not mean that you will be charged immediately. I think you got a $ 300 free credit (like a coupon) during the registration process, but only after that you will be billed for the actual cost. This $ 300 is just about a month of using one instance of GPU.

Launch an instance

First, apply for GPU usage. Click the menu icon like "Three" in the upper left → "IAM and Management" → "Assignment" → "Filter Table" → "Upper Limit Name" → "GPUs (all regions)" → "Compute Engine API" "Allocation amount"-> "Edit allocation" at the top-> Select with the check box-> Enter 1 (or any required number) as the new upper limit on the screen on the right-> "in order to run deep learning model" in the request description It should be okay if you include "for kaggle". --This request may take hours to days to approve. Please reload this page once in a while to see if the limits have been updated.
Next, deploy the instance. Click the menu icon such as "Three" in the upper left → "Marketplace" → Search for "Deep Learning VM" → "Start operation" → Set as follows. → After completing the settings, "Deploy" --Deployment name: Set any environment name. --Zone: You can leave it as it is. --Machine Type: Set the number of CPUs and memory. Kaggle's kernel notebook has 4 CPUs and 16 GB RAM as an indicator, although the number required depends on the analysis. --GPUs: I think it's okay if you choose NVIDIA Tesla K80. --Framework: It's okay as it is. --GPU, Access to Jupyter Lab: Please check both. --Boot disk: You can leave the disk type as it is, but change the setting if necessary. Increase the Boot disk size. I think you need at least 300-500. --Networking: You can leave it as it is. -** Price **: The price expected from the current setting status is displayed in the upper right corner of the page. Please consult with your budget and make minor adjustments to the above settings.
Start the instance. Click the menu icon like "Three" in the upper left → "Compute Engine" → "VM instance" → Click the "..." mark (vertical) at the right end of the instance you just deployed → Select "Start". --You should now be able to start the instance, so you will be charged. If you do not need it, select "Stop" from the same operation.

Environment

First, launch a black screen (prompt). From the same page as above (click the menu icon like "three" in the upper left → "Compute Engine" → "VM instance"), click "▼" in the second "connection" on the far right → "Open in browser window" --At this time, an error may be displayed on the started screen. In that case, close the started screen and open the black screen again in the same process. ――In addition to this, you can also connect with gcloud. If you are interested, please check it out.
Build the official Kaggle docker image. Type the following command on a black screen. (Excluding the # line)

#Official image
# https://github.com/Kaggle/docker-python
git clone https://github.com/Kaggle/docker-python.git
cd docker-python

#GPU version build
#I think it will take about 30 minutes
./build --gpu

#Check the image
# kaggle/python-gpu-If build is displayed, it is successful
docker images

Create a container. Type the following command on a black screen. (Excluding the # line)

docker run -itd  --runtime=nvidia  -p 8888:8888   -v /home:/home   --name cont-1   -h host   kaggle/python-gpu-build /bin/bash

Please note that the following are subject to change.

--cont-1: The container name is arbitrary. This time I did it like this. The same applies to cont-1 that appears after this. --kaggle / python-gpu-build: This is the image name displayed in the previous step "docker images". If you follow the steps, this example should be fine.

Enter the container. Do the following:

#Start the container
$ docker container start cont-1

#Enter the container
$ docker container attach cont-1

Enter exit to exit the container.

Confirm the operation of GPU. While in the container, do the following:

#Path setting
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

#Operation check
nvidia-smi

If you can see the GPU details, the container knows the GPU. If you get an error, please troubleshoot as described below.

Check the operation of GPU from python. With it in the container, first type `` `python``` to launch the Python client. When it starts up, do the following:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Again, if the GPU is displayed without error, the GPU settings are perfect. If there are any errors, please troubleshoot.

Troubleshooting GPU usage

Check if the GPU can be used from the instance itself in the first place.

nvidia-smi

If you don't see the GPU, it's suspected that you can't select the GPU in your instance settings. GCP "VM Instance"-> Click Instance-> Search for GPU with Ctrl + F on the instance details screen. If not, start over from the instance launch section.

Try another method to recognize the GPU and see if it can be started. If running the following command doesn't work, you may need to update the version of cudo.

docker run --runtime nvidia -itd -p 8887:8887 -v /home:/home --name gpu-test -h host

For more information, please see Comments here.

Launch Jupyter notebook

First, start Jupyter notebook. Do the following while in the container:

jupyter notebook --port 8888 --ip=0.0.0.0 --allow-root

Next, make settings to access the started notebook. [This article](jupyter notebook --port 8888 --ip = 0.0.0.0 --allow-root) is very detailed, so I will leave the explanation.

In addition, it may not be possible to connect with https, so in that case, please try with http as well.

Introduction of kaggle API

This setting is required because kaggle datasets are downloaded and uploaded through the API.

Install the kaggle package with pip.

pip install kaggle

Get the kaggle API key. Click kaggle HP → My account → API → Create New API Token. The json file will be downloaded.
Save the API key in your GCP instance. The procedure is to first create a directory on the instance, save the key there, and grant execute permission.

#Creating a directory
mkdir ~/.kaggle

#File creation
touch ~/.kaggle/kaggle.json

#Write to file
#It's easy to copy and paste the key downloaded from kaggle
vim  ~/.kaggle/kaggle.json

#Grant execution authority
chmod 777 ~/.kaggle/kaggle.json

Operate using the kaggle API. Please see the Official Document for details.

Other things you need to know

I think that the development environment is basically set up with the above, but I think that different errors will occur depending on each experiment in the future. Here are some things that you should know at that time to make it easier for you to solve yourself and to google.

--File operation --It's easier to remember Linux commands for something like creating, deleting, or moving directories and files. -Here is a concise summary of important commands. --Check occupied capacity ――It often happens that an error occurs due to insufficient capacity. It's a good idea to remember the df and du commands. -Here is summarized briefly.

reference

-Building a calculation environment for Kaggle with GCP and Docker -Procedure for building kaggle environment with GCP and docker -[Install Jupyter Notebook on Compute Engine instance using Anaconda](https://qiita.com/tk_01/items/307716a680460f8dbe17#vm%E3%82%A4%E3%83%B3%E3% 82% B9% E3% 82% BF% E3% 83% B3% E3% 82% B9% E3% 81% AE% E3% 83% 95% E3% 82% A1% E3% 82% A4% E3% 82% A2% E3% 83% BC% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 83% AB% E3% 81% AE% E8% A8% AD% E5% AE% 9A)

[PYTHON] Build GPU environment with GCP and kaggle official image (docker)