Subsystem B (ITO-B Of Supercomputer ITO ac.jp/scp/system/ITO/01_intro.html)) is equipped with a GPU. I will introduce how to execute TensorFlow on ITO-B. Batch processing is required to use the GPU. Therefore, it should be executed as Python code, not Jupyter. The procedure for executing TensorFlow on the front end of the supercomputer ITO was introduced in this article. The procedure has much in common with the front end, but batch processing with ITO-B is easier. ** All the following steps are performed at the login node of the supercomputer ITO. ** **
It is assumed that the Python base environment is built with Miniconda. Build a virtual environment with ʻanaconda channeland install the
tensorflow-gpu package with ʻanaconda channel
. Please refer to this article for the background of this.
Using Miniconda prepared by this article, prepare a new virtual environment tf and proceed with the installation work.
$ conda create -c anaconda -n tf
$ conda activate tf
$ conda install -c anaconda tensorflow-gpu
Since GPU cannot be used at the login node, operation cannot be confirmed at this stage.
Please refer to Official Site for batch processing. Create a bash script that includes how to prepare the environment including GPU load and how to execute Python code. The following is the bash script ʻito_b.shwhen using one GPU. The part after the arrow
←` is a comment, so actually delete it. The resource group is determined by referring to the Official Site.
ito_b.sh
#!/bin/bash
#PJM -L "rscunit=ito-b" ← ITO-Specify B
#PJM -L "rscgrp=ito-g-1"← Resource group specification
#PJM -L "vnode=1"← Specify the number of nodes to use
#PJM -L "vnode-core=9"← Specify the number of cores per node
#PJM -L "elapse=12:00:00"← Specify the maximum calculation time (specify 12 hours)
#PJM -X ← Specify that the environment variable of the login node is inherited even in batch processing
source ~/.bashrc #← Miniconda sets Python settings.I am writing to bashrc and reading this
module load cuda/10.1 #← CUDA 10 because it uses GPU.Load 1
module list #← Confirm that CUDA is loaded
conda activate tf #← Enter the virtual environment tf
conda info -e #← Confirm that you have entered the virtual environment tf
python ann_experiments.py #← Execute Python code
conda deactivate #← Exit from the virtual environment
Since batch processing does not inherit the environment of the login node (other than environment variables), it is necessary to build a GPU load and Python environment. There is no need to enter the tf virtual environment at the login node. This example assumes that the Python code and ʻito_b.sh` are in the same directory.
Put the created batch processing script ʻito_b.sh` into the batch processing system as a batch job.
$ pjsub ito_b.sh
All you have to do is wait until the process is complete, so you can log out. To check the status of the job, do as follows.
$ pjstat
When the batch job ends, standard output and standard error output are output with file names such as ʻito_b.sh.o0000000 and ʻito_b.sh.e0000000
, respectively. The file name consists of o or e and a 7-digit number in addition to the batch processing script name. Be careful not to make the standard output file too large.
Recommended Posts