Since it was necessary to have a GPU-equipped machine power to participate in the kaggle image competition, we built a GPU environment with GCP. There are already a number of articles that are very helpful, and I built it with reference to them, but there were many errors due to differences in versions, so I will summarize it again here. I hope it helps those who are new to GCP. We would also like to thank those who are familiar with the matter for pointing out any mistakes or improvements.
First, apply for GPU usage. Click the menu icon like "Three" in the upper left → "IAM and Management" → "Assignment" → "Filter Table" → "Upper Limit Name" → "GPUs (all regions)" → "Compute Engine API" "Allocation amount"-> "Edit allocation" at the top-> Select with the check box-> Enter 1 (or any required number) as the new upper limit on the screen on the right-> "in order to run deep learning model" in the request description It should be okay if you include "for kaggle". --This request may take hours to days to approve. Please reload this page once in a while to see if the limits have been updated.
Next, deploy the instance. Click the menu icon such as "Three" in the upper left → "Marketplace" → Search for "Deep Learning VM" → "Start operation" → Set as follows. → After completing the settings, "Deploy" --Deployment name: Set any environment name. --Zone: You can leave it as it is. --Machine Type: Set the number of CPUs and memory. Kaggle's kernel notebook has 4 CPUs and 16 GB RAM as an indicator, although the number required depends on the analysis. --GPUs: I think it's okay if you choose NVIDIA Tesla K80. --Framework: It's okay as it is. --GPU, Access to Jupyter Lab: Please check both. --Boot disk: You can leave the disk type as it is, but change the setting if necessary. Increase the Boot disk size. I think you need at least 300-500. --Networking: You can leave it as it is. -** Price **: The price expected from the current setting status is displayed in the upper right corner of the page. Please consult with your budget and make minor adjustments to the above settings.
Start the instance. Click the menu icon like "Three" in the upper left → "Compute Engine" → "VM instance" → Click the "..." mark (vertical) at the right end of the instance you just deployed → Select "Start". --You should now be able to start the instance, so you will be charged. If you do not need it, select "Stop" from the same operation.
First, launch a black screen (prompt). From the same page as above (click the menu icon like "three" in the upper left → "Compute Engine" → "VM instance"), click "▼" in the second "connection" on the far right → "Open in browser window" --At this time, an error may be displayed on the started screen. In that case, close the started screen and open the black screen again in the same process. ――In addition to this, you can also connect with gcloud. If you are interested, please check it out.
Build the official Kaggle docker image. Type the following command on a black screen. (Excluding the # line)
#Official image
# https://github.com/Kaggle/docker-python
git clone https://github.com/Kaggle/docker-python.git
cd docker-python
#GPU version build
#I think it will take about 30 minutes
./build --gpu
#Check the image
# kaggle/python-gpu-If build is displayed, it is successful
docker images
docker run -itd --runtime=nvidia -p 8888:8888 -v /home:/home --name cont-1 -h host kaggle/python-gpu-build /bin/bash
Please note that the following are subject to change.
--cont-1: The container name is arbitrary. This time I did it like this. The same applies to cont-1 that appears after this. --kaggle / python-gpu-build: This is the image name displayed in the previous step "docker images". If you follow the steps, this example should be fine.
#Start the container
$ docker container start cont-1
#Enter the container
$ docker container attach cont-1
Enter exit
to exit the container.
#Path setting
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
#Operation check
nvidia-smi
If you can see the GPU details, the container knows the GPU. If you get an error, please troubleshoot as described below.
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
Again, if the GPU is displayed without error, the GPU settings are perfect. If there are any errors, please troubleshoot.
nvidia-smi
If you don't see the GPU, it's suspected that you can't select the GPU in your instance settings. GCP "VM Instance"-> Click Instance-> Search for GPU with Ctrl + F on the instance details screen. If not, start over from the instance launch section.
docker run --runtime nvidia -itd -p 8887:8887 -v /home:/home --name gpu-test -h host
For more information, please see Comments here.
jupyter notebook --port 8888 --ip=0.0.0.0 --allow-root
In addition, it may not be possible to connect with https, so in that case, please try with http as well.
This setting is required because kaggle datasets are downloaded and uploaded through the API.
pip install kaggle
Get the kaggle API key. Click kaggle HP → My account → API → Create New API Token. The json file will be downloaded.
Save the API key in your GCP instance. The procedure is to first create a directory on the instance, save the key there, and grant execute permission.
#Creating a directory
mkdir ~/.kaggle
#File creation
touch ~/.kaggle/kaggle.json
#Write to file
#It's easy to copy and paste the key downloaded from kaggle
vim ~/.kaggle/kaggle.json
#Grant execution authority
chmod 777 ~/.kaggle/kaggle.json
I think that the development environment is basically set up with the above, but I think that different errors will occur depending on each experiment in the future. Here are some things that you should know at that time to make it easier for you to solve yourself and to google.
--File operation --It's easier to remember Linux commands for something like creating, deleting, or moving directories and files. -Here is a concise summary of important commands. --Check occupied capacity ――It often happens that an error occurs due to insufficient capacity. It's a good idea to remember the df and du commands. -Here is summarized briefly.
-Building a calculation environment for Kaggle with GCP and Docker -Procedure for building kaggle environment with GCP and docker -[Install Jupyter Notebook on Compute Engine instance using Anaconda](https://qiita.com/tk_01/items/307716a680460f8dbe17#vm%E3%82%A4%E3%83%B3%E3% 82% B9% E3% 82% BF% E3% 83% B3% E3% 82% B9% E3% 81% AE% E3% 83% 95% E3% 82% A1% E3% 82% A4% E3% 82% A2% E3% 83% BC% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 83% AB% E3% 81% AE% E8% A8% AD% E5% AE% 9A)
Recommended Posts