Einführung

Es ist natürlich, dass Sie riesige Computerressourcen benötigen, um Deep Learning durchzuführen, aber die Spezifikationen Ihres PCs reichen nicht aus. Daher halte ich es auch für eine gute Idee, schnell eine GPU-Instanz von EC2 einzurichten.

Das Framework ist Keras und das Backend ist die TensorFlow-GPU-Version. Es wurde so konfiguriert, dass es in der virtuellen Umgebung von pyenv eingerichtet wird. TensorFlow 1.0 wurde neulich veröffentlicht Es scheint, dass tf.keras implementiert wurde, aber ich habe es noch nicht ausprobiert, also ist es normal Ich werde Keras verwenden.

Ich habe hier auf den Umweltbau Bezug genommen. Führen Sie TensorFlow auf einer AWS-GPU-Instanz aus

Instanz erstellen

EC2-Instanz: Ubuntu Server 16.04 LTS (HVM), SSD-Volume-Typ im Schnellstart Type : g2.2xlarge Standardkonfiguration für Speicher usw. (Speicher 15 GB, Speicher 8 GB)

installieren

Melden Sie sich zunächst mit SSH an

ssh -i ~/[ec2key].pem ubuntu@[Instance IP]

Erstellen Sie einen symbolischen Link, um den kurzlebigen Speicher zu einem Arbeitsverzeichnis zu machen

Da es mit CUDA ziemlich groß ist, gibt es mitten in der Arbeit nicht genügend freien Speicherplatz. Erstellen Sie einen symbolischen Link zu / mnt / tmp /, um den kurzlebigen Speicher für Ihren Arbeitsbereich zu verwenden.

sudo mkdir /mnt/tmp
sudo chmod 777 /mnt/tmp
sudo rm -rf /tmp
sudo ln -s /mnt/tmp /tmp
cd /tmp

Aktualisieren Sie Ubuntu auf den neuesten Stand

sudo apt-get update
sudo apt-get upgrade -y

Legen Sie das Gebietsschema fest

Nach dem Upgrade wird die Warnung zum Gebietsschema angezeigt, was ärgerlich ist.

sudo apt-get install language-pack-ja
sudo update-locale LANG=ja_JP.UTF-8

Installieren Sie die für die Einrichtung erforderlichen Module

sudo apt-get install python
sudo apt-get install -y build-essential python-pip python-dev git python-numpy swig python-dev default-jdk zip zlib1g-dev ipython

Fügen Sie die Nouveau Blacklist hinzu, um Konflikte mit NVIDIA-Treibern zu vermeiden

echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot

Melden Sie sich nach dem Neustart an und installieren Sie linux-image-extra-virtual.

ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
sudo apt-get install -y linux-image-extra-virtual
sudo reboot

Melden Sie sich nach dem Neustart an und installieren Sie Linux-Header.

ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
sudo apt-get install -y linux-source linux-headers-`uname -r`

Richten Sie das CUDA Toolkit v8 ein

Die aktuellste Version ist derzeit 8.0. Herunterladen und installieren

cd /tmp
wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run
chmod +x cuda_8.0.44_linux-run
./cuda_8.0.44_linux-run  -extract=`pwd`/nvidia_installers
cd nvidia_installers/
sudo ./NVIDIA-Linux-x86_64-367.48.run
#Wählen Sie Akzeptieren. Das Kernel-Setup verfügt über 100 Fortschrittsbalken%Es dauert eine Weile nach dem Werden
#Wählen Sie OK
#Wählen Sie OK
#Wählen Sie Ja
#Wählen Sie OK

sudo modprobe nvidia
sudo ./cuda-linux64-rel-8.0.44-21122537.run
#Readme wird angezeigt, beenden Sie also mit q
#Geben Sie akzeptieren ein
# install Path: default
# shortcut?: default

CuDNN einrichten

Laden Sie cuDNN einmal lokal von https://developer.nvidia.com/cudnn herunter Weil cuDNN nicht gelöscht werden kann, ohne sich bei Developer anzumelden. es nützt nichts, es bringt nichts.

Übertragen Sie von Ihrem lokalen Terminal heruntergeladene cuDNN auf Ihre EC2-Instanz

scp -i [ec2key].pem cudnn-8.0-linux-x64-v5.1.tgz ubuntu@[Instance IP]:/tmp

Kehren Sie zur EC2-Instanz zurück

cd /tmp
tar -xzf cudnn-8.0-linux-x64-v5.1.tgz
sudo mv ./cuda/lib64/* /usr/local/cuda/lib64/
sudo mv ./cuda/include/* /usr/local/cuda/include/

Fügen Sie Folgendes zu ~ / .bashrc hinzu

`.bashrc`


# cuDNN
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

source ~/.bashrc

Erstellen einer virtuellen Python-Umgebung

python -V
Python 2.7.12

sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
git clone https://github.com/yyuu/pyenv.git ~/.pyenv
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv

Erstellen Sie "~ / .bash_profile" und schreiben Sie Folgendes

`.bash_profile`


# pyenv
export PYENV_ROOT=$HOME/.pyenv
export PATH=$PYENV_ROOT/bin:$PATH
eval "$(pyenv init -)"
# virtualenv
eval "$(pyenv virtualenv-init -)"
export PYENV_VIRTUALENV_DISABLE_PROMPT=1

source ~/.bash_profile

Erstellen Sie eine virtuelle Umgebung für Keras

pyenv install 3.5.3
pyenv virtualenv 3.5.3 keras
pyenv activate keras
python -V
Python 3.5.3

Installieren Sie Tensorflow

pip install tensorflow-gpu

Überprüfen Sie die Version von TensorFlow. Wenn dies wie folgt aussieht, können Sie die GPU verwenden.

python -c 'import tensorflow as tf; print(tf.__version__)'
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
1.0.0

Installieren Sie Keras

pip install pillow
pip install h5py
pip install matplotlib
pip install keras

Versuchen Sie, Keras-Beispiele auszuführen

cd /tmp
git clone https://github.com/fchollet/keras.git
cd keras/examples

Versuchen Sie vorerst, diejenige auszuführen, die MNIST mit CNN löst

python mnist_cnn.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
60000/60000 [==============================] - 13s - loss: 0.3770 - acc: 0.8839 - val_loss: 0.0932 - val_acc: 0.9709
Epoch 2/12
60000/60000 [==============================] - 11s - loss: 0.1363 - acc: 0.9603 - val_loss: 0.0632 - val_acc: 0.9801
Epoch 3/12
60000/60000 [==============================] - 11s - loss: 0.1064 - acc: 0.9687 - val_loss: 0.0509 - val_acc: 0.9835
Epoch 4/12
60000/60000 [==============================] - 11s - loss: 0.0900 - acc: 0.9736 - val_loss: 0.0443 - val_acc: 0.9857
Epoch 5/12
60000/60000 [==============================] - 11s - loss: 0.0769 - acc: 0.9775 - val_loss: 0.0405 - val_acc: 0.9865
Epoch 6/12
60000/60000 [==============================] - 11s - loss: 0.0689 - acc: 0.9795 - val_loss: 0.0371 - val_acc: 0.9870
Epoch 7/12
60000/60000 [==============================] - 11s - loss: 0.0649 - acc: 0.9803 - val_loss: 0.0361 - val_acc: 0.9881
Epoch 8/12
60000/60000 [==============================] - 11s - loss: 0.0594 - acc: 0.9823 - val_loss: 0.0356 - val_acc: 0.9886
Epoch 9/12
60000/60000 [==============================] - 11s - loss: 0.0547 - acc: 0.9841 - val_loss: 0.0321 - val_acc: 0.9889
Epoch 10/12
60000/60000 [==============================] - 11s - loss: 0.0525 - acc: 0.9841 - val_loss: 0.0320 - val_acc: 0.9889
Epoch 11/12
60000/60000 [==============================] - 11s - loss: 0.0506 - acc: 0.9850 - val_loss: 0.0323 - val_acc: 0.9892
Epoch 12/12
60000/60000 [==============================] - 11s - loss: 0.0471 - acc: 0.9856 - val_loss: 0.0314 - val_acc: 0.9897
Test score: 0.0314083654978
Test accuracy: 0.9897

Die Ausführungszeit beträgt: 2:23 (ohne Zeit zum Herunterladen von Daten). Mit meinem MBA hat es ungefähr 35 Minuten gedauert, also ist es mehr als zehnmal schneller. Es sind ungefähr 10 Sekunden pro Epoche.

Ich habe auch IMDBs LSTM ausprobiert, das schwer zu sein scheint.

python imdb_cnn_lstm.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 100)
X_test shape: (25000, 100)
Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/2
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
   30/25000 [..............................] - ETA: 1397s - loss: 0.6936 - acc: 0.4333I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3811 get requests, put_count=2890 evicted_count=1000 eviction_rate=0.346021 and unsatisfied allocation rate=0.530307
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
  360/25000 [..............................] - ETA: 160s - loss: 0.6935 - acc: 0.4833I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2156 get requests, put_count=2374 evicted_count=1000 eviction_rate=0.42123 and unsatisfied allocation rate=0.373377
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
  870/25000 [>.............................] - ETA: 94s - loss: 0.6925 - acc: 0.5287I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4249 get requests, put_count=4491 evicted_count=1000 eviction_rate=0.222668 and unsatisfied allocation rate=0.192281
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 655 to 720
25000/25000 [==============================] - 63s - loss: 0.3815 - acc: 0.8210 - val_loss: 0.3519 - val_acc: 0.8456
Epoch 2/2
25000/25000 [==============================] - 60s - loss: 0.1970 - acc: 0.9238 - val_loss: 0.3471 - val_acc: 0.8534
24990/25000 [============================>.] - ETA: 0sTest score: 0.347144101623
Test accuracy: 0.853440059948

Ausführungszeit: 2:25 (ohne Zeit zum Herunterladen von Daten). Es war überwältigend, weil der MBA mehr als 40 Minuten dauerte.

Lassen Sie uns weitermachen und etwas ausprobieren, das schwerer zu sein scheint. mnist_acgan.py scheint MNIST mit einem Typen namens ACGAN (Auxiliary Classifier Generative Adversarial Network) zu lösen. Ist es ein Verwandter von DCGAN? Für Details scheint es in hier aufgeführt zu sein, aber es ist schwierig, also werde ich es verschieben. Es wird schwer sein, weil es vorerst GAN ist. Wie ist es?

python mnist_acgan.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Epoch 1 of 50
  0/600 [..............................] - ETA: 0sI tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 3.74GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
599/600 [============================>.] - ETA: 1s   
Testing for epoch 1:
component              | loss | generation_loss | auxiliary_loss
-----------------------------------------------------------------
generator (train)      | 3.88 | 1.49            | 2.39 
generator (test)       | 3.36 | 1.04            | 2.32 
discriminator (train)  | 2.11 | 0.53            | 1.58 
discriminator (test)   | 2.12 | 0.70            | 1.43 
Epoch 2 of 50
 44/600 [=>............................] - ETA: 732s

Es dauerte ungefähr 15 Minuten, um eine Epoche zu beenden. Es sind noch 48 Epochen übrig. .. Es war endlos und kostspielig, also blieb ich auf halbem Weg stehen. Wenn Sie es einen halben Tag lang drehen, erhalten Sie Ergebnisse. Es ist wunderbar.

Festplattenfreier Status nach der Arbeit

df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
udev             7679880       0   7679880   0% /dev
tmpfs            1539900    8800   1531100   1% /run
/dev/xvda1       8117828 6149060   1533492  81% /
tmpfs            7699496       0   7699496   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            7699496       0   7699496   0% /sys/fs/cgroup
/dev/xvdb       66946696 3017852  60521484   5% /mnt
tmpfs            1539904       0   1539904   0% /run/user/1000

Die Systemfestplattenauslastung beträgt 81%. Der kurzlebige Speicher verflüchtigt sich, wenn die Instanz gestoppt wird. Wenn Sie sich also Sorgen über die Standard-8 GB machen, sollten Sie die Speicherprozedur erweitern.

Ich erhalte eine Fehlermeldung, wenn ich versuche, eine g2.xlarge-Instanz zu erstellen, nachdem ich sie zu einer AMI gemacht habe

Ich habe einen Schnappschuss der Instanz gemacht, die ich erstellen konnte, und eine g2.xlarge-Instanz aus dem AMI erstellt. Versuche richtig zu arbeiten.

pyenv activate keras
python -V
 Python 3.5.3
python -c 'import tensorflow as tf; print(tf.__version__)'
Traceback (most recent call last):
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 72, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

TensorFlow gibt einen Fehler aus. Ich bin nicht sicher, aber hängt es mit CUDA zusammen? Möglicherweise muss später geprüft werden, ob es in AMI konvertiert werden kann. Wenn Sie es zeitweise verwenden möchten, können Sie die Instanz stoppen und verlassen.

Zusammenfassung

Das Erstellen der oben genannten Keras-Umgebung dauert weniger als eine Stunde. Es ist in Ordnung, die Instanz zu stoppen und zu verlassen, aber es ist nicht allzu schwierig, sie aus Sara zu erstellen, wenn Sie sie verwenden möchten. Es ist besser, nach der Häufigkeit der Verwendung und dem Fall zu denken. Es ist viel einfacher als vor der Erstellung der TensorFlow-GPU-Umgebung.

[PYTHON] Erstellen einer Keras-Umgebung auf einer AWS E2 G2-Instanz Februar 2017