Item | Version |
---|---|
OS | Ubuntu 18.04 |
OpenSSH | 7.6p1 |
Tellus
Item | Spec |
---|---|
OS | Ubuntu 18.04(64bit) |
GPU | NVIDIA Tesla V100 (32GB) ×1 |
CPU | Xeon 4Core 3.7GHz 1CPU |
Disk | MLC SSD 480GB ×2 |
Memory | 64GB |
Befolgen Sie grundsätzlich die Anweisungen unter Installationsverfahren für CUDA Toolkit / GPU-Kartentreiber.
Tellus-Konto-Dashboard → Siehe Entwicklungsumgebung
Item | Entsprechender Artikel |
---|---|
Server IP | Umgebungshostname / IP |
Anmelde-ID | Per E-Mail vom Management gesendet |
Erstes Passwort | Token-Informationen / SSHPW-Informationen |
~ / .ssh / config
~/.ssh/config
Host tellus
HostName [Umgebungshostname / IP]
User [Anmelde-ID]
IdentityFile ~/.ssh/id_rsa
Vorbereitung vor der Installation des GPU-Treibers
sudo apt update
sudo apt upgrade
apt install build-essential
apt install dkms
CUDA Toolkit
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
chmod +x cuda_10.2.89_440.33.01_linux.run
sudo ./cuda_10.2.89_440.33.01_linux.run --toolkit --samples --samplespath=/usr/local/cuda-samples --no-opengl-libs
/etc/profile.d/cuda.sh
export CUDA_HOME="/usr/local/cuda"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CPATH="/usr/local/include:$CUDA_HOME/include:$CPATH"
export INCLUDE_PATH="$CUDA_HOME/include"
shell:/etc/profile.d/cuda.csh
export CUDA_HOME="/usr/local/cuda"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CPATH="/usr/local/include:$CUDA_HOME/include:$CPATH"
export INCLUDE_PATH="$CUDA_HOME/include"
CUDA Driver
wget https://us.download.nvidia.com/tesla/440.95.01/NVIDIA-Linux-x86_64-440.95.01.run
chmod +x NVIDIA-Linux-x86_64-440.95.01.run
sudo ./NVIDIA-Linux-x86_64-440.95.01.run --no-opengl-files --no-libglx-indirect --dkms
cuDNN
client
scp -r cudnn-10.2-linux-x64-v8.0.3.33.tgz tellus:~/
server
tar xvzf cudnn-10.2-linux-x64-v8.0.3.33.tgz
sudo mv cuda/include/cudnn.h /usr/local/cuda/include/
sudo mv cuda/lib64/* /usr/local/cuda/lib64/
nvidia-smi
Anaconda
wget https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
sudo bash Anaconda3-2020.07-Linux-x86_64.sh
conda update -n base conda
.bashrc
Folgendes hinzugefügt (py38 ist der Umgebungsname)..bashrc
export PYTHONPATH="/home/[Anmelde-ID]/anaconda3/envs/py38/lib/python3.8:/home/[Anmelde-ID]/anaconda3/envs/py38/lib/python3.8/site-packages:$PYTHONPATH"
PyTorch
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
MLFlow
conda install -c conda-forge mlflow
Wenn Sie "mlflow ui" eingeben, wird die Benutzeroberfläche unter "localhost: 5000" gestartet und Sie können die experimentellen Ergebnisse mit dem Browser überprüfen.
Fügen Sie die Einstellung LocalForward zu ~ / .ssh / config
hinzu, damit sie im clientseitigen Browser angezeigt werden kann, wenn die Benutzeroberfläche auf der Serverseite gestartet wird.
~/.ssh/config
Host tellus
HostName [Umgebungshostname / IP]
User [Anmelde-ID]
IdentityFile ~/.ssh/id_rsa
LocalForward [Clientseitige Portnummer] localhost:5000
QGIS
conda install -c conda-forge qgis=3.10.8
nvidia-smi
, wenn Sie Code ausführencifar10.py
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
from tqdm import tqdm
batch = 1024
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def dataloader(is_train: bool, transform: transforms.Compose) -> DataLoader:
dataset = CIFAR10(root='./data', train=is_train, download=True, transform=transform)
return DataLoader(dataset, batch_size=batch, shuffle=is_train, num_workers=os.cpu_count())
def model() -> nn.Module:
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(512, 10)
return model.to(device)
def training(net: nn.Module, trainloader: DataLoader, epochs: int) -> None:
# loss function & optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(epochs): # loop over the dataset multiple times
running_loss = 0.0
bar = tqdm(trainloader, desc="training model [epoch:{:02d}]".format(epoch), total=len(trainloader))
for data in bar:
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data[0].to(device), data[1].to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
bar.set_postfix(device=device, batch=batch, loss=(running_loss / len(trainloader)))
print('Finished Training')
transform = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainloader = dataloader(True, transform)
net = model()
training(net, trainloader, 3)
CPU-Ergebnis
GPU-Ergebnis
MLFlow
record_sin.py
from math import pi, sin
import mlflow
mlflow.set_experiment('test')
amplitude = 2.0
with mlflow.start_run() as _:
mlflow.log_param('amplitude', amplitude)
for i in range(360):
sin_val = amplitude * sin(i * pi / 180.)
mlflow.log_metric('sin wave', sin_val, step=i)
~/test_code/
python record_sin.py
mlflow ui
Ergebnisbild
QGIS
ssh -X tellus
qgis
cifar10.py
und record_sin.py
haben
conda install -c conda-forge ipykernel
Tellus FAQ Takenoko Blog-Erstellen einer PyTorch-Umgebung vom GPU-Server von Tellus
Recommended Posts