Item | Version |
---|---|
OS | Ubuntu 18.04 |
OpenSSH | 7.6p1 |
Tellus
Item | Spec |
---|---|
OS | Ubuntu 18.04(64bit) |
GPU | NVIDIA Tesla V100 (32GB) ×1 |
CPU | Xeon 4Core 3.7GHz 1CPU |
Disk | MLC SSD 480GB ×2 |
Memory | 64GB |
Basically, follow the procedure of CUDA Toolkit / GPU card driver installation procedure
Tellus account dashboard → See development environment
Item | Corresponding item |
---|---|
Server IP | Environment host name / IP |
Login ID | Emailed from the operation |
Initial password | Token information / SSHPW information |
~ / .ssh / config
~/.ssh/config
Host tellus
HostName [Environment host name / IP]
User [Login ID]
IdentityFile ~/.ssh/id_rsa
ssh tellus
on the Terminal and you will be asked for the password. Enter the initial password to complete the connection.Preparation before installing GPU driver
sudo apt update
sudo apt upgrade
apt install build-essential
apt install dkms
CUDA Toolkit
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
chmod +x cuda_10.2.89_440.33.01_linux.run
sudo ./cuda_10.2.89_440.33.01_linux.run --toolkit --samples --samplespath=/usr/local/cuda-samples --no-opengl-libs
/etc/profile.d/cuda.sh
export CUDA_HOME="/usr/local/cuda"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CPATH="/usr/local/include:$CUDA_HOME/include:$CPATH"
export INCLUDE_PATH="$CUDA_HOME/include"
shell:/etc/profile.d/cuda.csh
export CUDA_HOME="/usr/local/cuda"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CPATH="/usr/local/include:$CUDA_HOME/include:$CPATH"
export INCLUDE_PATH="$CUDA_HOME/include"
CUDA Driver
wget https://us.download.nvidia.com/tesla/440.95.01/NVIDIA-Linux-x86_64-440.95.01.run
chmod +x NVIDIA-Linux-x86_64-440.95.01.run
sudo ./NVIDIA-Linux-x86_64-440.95.01.run --no-opengl-files --no-libglx-indirect --dkms
cuDNN
client
scp -r cudnn-10.2-linux-x64-v8.0.3.33.tgz tellus:~/
server
tar xvzf cudnn-10.2-linux-x64-v8.0.3.33.tgz
sudo mv cuda/include/cudnn.h /usr/local/cuda/include/
sudo mv cuda/lib64/* /usr/local/cuda/lib64/
nvidia-smi
Anaconda
wget https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
sudo bash Anaconda3-2020.07-Linux-x86_64.sh
conda update -n base conda
.bashrc
(py38 is the environment name).bashrc
export PYTHONPATH="/home/[Login ID]/anaconda3/envs/py38/lib/python3.8:/home/[Login ID]/anaconda3/envs/py38/lib/python3.8/site-packages:$PYTHONPATH"
PyTorch
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
MLFlow
conda install -c conda-forge mlflow
If you enter mlflow ui
, the UI will be launched at localhost: 5000
and you can check the experimental results with your browser.
Add LocalForward setting to ~ / .ssh / config
so that it can be viewed on the client side browser when the UI is launched on the server side.
~/.ssh/config
Host tellus
HostName [Environment host name / IP]
User [Login ID]
IdentityFile ~/.ssh/id_rsa
LocalForward [Client side port number] localhost:5000
QGIS
conda install -c conda-forge qgis=3.10.8
nvidia-smi
when executing codedevice = torch.device ("cpu ")
cifar10.py
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
from tqdm import tqdm
batch = 1024
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def dataloader(is_train: bool, transform: transforms.Compose) -> DataLoader:
dataset = CIFAR10(root='./data', train=is_train, download=True, transform=transform)
return DataLoader(dataset, batch_size=batch, shuffle=is_train, num_workers=os.cpu_count())
def model() -> nn.Module:
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(512, 10)
return model.to(device)
def training(net: nn.Module, trainloader: DataLoader, epochs: int) -> None:
# loss function & optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(epochs): # loop over the dataset multiple times
running_loss = 0.0
bar = tqdm(trainloader, desc="training model [epoch:{:02d}]".format(epoch), total=len(trainloader))
for data in bar:
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data[0].to(device), data[1].to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
bar.set_postfix(device=device, batch=batch, loss=(running_loss / len(trainloader)))
print('Finished Training')
transform = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainloader = dataloader(True, transform)
net = model()
training(net, trainloader, 3)
CPU results
GPU results
MLFlow
record_sin.py
from math import pi, sin
import mlflow
mlflow.set_experiment('test')
amplitude = 2.0
with mlflow.start_run() as _:
mlflow.log_param('amplitude', amplitude)
for i in range(360):
sin_val = amplitude * sin(i * pi / 180.)
mlflow.log_metric('sin wave', sin_val, step=i)
~/test_code/
python record_sin.py
mlflow ui
Result image
QGIS
ssh -X tellus
qgis
cifar10.py
and record_sin.py
conda install -c conda-forge ipykernel
Tellus FAQ Bamboo shoot blog-Building a PyTorch environment from the Terraus GPU server
Recommended Posts