[PYTHON] I installed TensorFlow (GPU version) on Ubuntu

Overview

A record of installing TensorFlow, an open source artificial intelligence library published by Google, on Ubuntu. Until you run CIFAR-10 training with CUDA Enable.

Official: http://www.tensorflow.org Git: https://tensorflow.googlesource.com/tensorflow

Machine configuration

Setup procedure

Basically, it is as Official, but record it in the order you went.

(1). Get the source tree from Git

#Only for those who don't have Git
$ sudo apt-get install git

$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow

(2). Installation of Cuda Toolkit 7.0
Since 7.5 got stuck in the following steps, install 7.0
DL & install Ubuntu 14.04 DEB (10KB) network installer version (cuda-repo-ubuntu1404-7-0-local_7.0-28_amd64.deb) from the following
https://developer.nvidia.com/cuda-toolkit-70

$ sudo dpkg -i cuda-repo-ubuntu1404-7-0-local_7.0-28_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda-7-0 

(3). Installation of CUDNN Toolkit 6.5
CUDNN DL needs to be registered on Nvidia site (and I feel like I had to wait a few days to complete the registration)
DL & install cuDNN v2 Library for Linux (cudnn-6.5-linux-x64-v2.tgz) from the following
https://developer.nvidia.com/rdp/cudnn-archive

$ tar xvzf cudnn-6.5-linux-x64-v2.tgz 
$ sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda/include
$ sudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda/lib64

** Reboot here **

(4). Installing VirtualEnv and creating a container

#Installation
$ sudo apt-get install python-pip python-dev python-virtualenv

#Container creation
$ virtualenv --system-site-packages ~/tensorflow-GPU

Edit ~ / tensorflow-GPU / bin / activate Add the following two lines at the end

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

(5). TensorFlow installation
Execute the following only once when the CUDA library path changes

#(1)Move to the source directory obtained in
$ cd ~/tensorflow/tensorflow

#Granting effective authority
$ chmod x+ ./configure

$ ./configure
Do you wish to bulid TensorFlow with GPU support? [y/n] y
GPU support will be enabled for TensorFlow

Please specify the location where CUDA 7.0 toolkit is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Please specify the location where CUDNN 6.5 V2 library is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

Container activation In the future, if you want to launch a new terminal and work with tensorflow-GPU container, first enable it below

$ cd ~/tensorflow-GPU
$ source bin/activate

Install TensorFlow for GPU

(tensorflow-GPU) $ pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl

Try to move

(1). MNIST

When I tried to run MNIST, an error occurred (as of November 15, 2015), so make the following changes

(tensorflow-GPU) $ cd ~/tensorflow/tensorflow/g3doc/tutorials/mnist/
#Rename the file to replace
(tensorflow-GPU) $ mv mnist.py mnist_org.py
#Get the old version from the repository
(tensorflow-GPU) $ wget https://raw.githubusercontent.com/tensorflow/tensorflow/1d76583411038767f673a0c96174c80eaf9ff42f/tensorflow/g3doc/tutorials/mnist/mnist.py

Lines 23 and 24 of fully_connected_feed.py should look like this:

#from tensorflow.g3doc.tutorials.mnist import input_data
#from tensorflow.g3doc.tutorials.mnist import mnist
import input_data
import mnist

Try to move

(tensorflow-GPU) $ python fully_connected_feed.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.22GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 3144105984
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
Step 0: loss = 2.34 (0.300 sec)
Step 100: loss = 2.13 (0.002 sec)
Step 200: loss = 1.90 (0.002 sec)
Step 300: loss = 1.52 (0.002 sec)
Step 400: loss = 1.22 (0.002 sec)
Step 500: loss = 0.84 (0.002 sec)
Step 600: loss = 0.82 (0.002 sec)
Step 700: loss = 0.68 (0.002 sec)
Step 800: loss = 0.71 (0.002 sec)
Step 900: loss = 0.51 (0.002 sec)
Training Data Eval:
  Num examples: 55000  Num correct: 47651  Precision @ 1: 0.8664
Validation Data Eval:
  Num examples: 5000  Num correct: 4363  Precision @ 1: 0.8726
Test Data Eval:
  Num examples: 10000  Num correct: 8745  Precision @ 1: 0.8745
Step 1000: loss = 0.46 (0.002 sec)
Step 1100: loss = 0.44 (0.038 sec)
Step 1200: loss = 0.52 (0.002 sec)
Step 1300: loss = 0.43 (0.002 sec)
Step 1400: loss = 0.64 (0.002 sec)
Step 1500: loss = 0.34 (0.002 sec)
Step 1600: loss = 0.41 (0.002 sec)
Step 1700: loss = 0.34 (0.002 sec)
Step 1800: loss = 0.30 (0.002 sec)
Step 1900: loss = 0.35 (0.002 sec)
Training Data Eval:
  Num examples: 55000  Num correct: 49286  Precision @ 1: 0.8961
Validation Data Eval:
  Num examples: 5000  Num correct: 4529  Precision @ 1: 0.9058
Test Data Eval:
  Num examples: 10000  Num correct: 9012  Precision @ 1: 0.9012

(2). CIFAR-10 Try to move

(tensorflow-GPU) $ cd ~/tensorflow/tensorflow/models/image/cifar10/
(tensorflow-GPU) $ python cifar10_train.py
>> Downloading cifar-10-binary.tar.gz 100.0%
Succesfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 8
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:888] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:88] Found device 0 with properties: 
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.20GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:112] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:122] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:643] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc:47] Setting region size to 3120906240
I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op parallelism threads: 8
2015-11-17 02:14:46.611756: step 0, loss = 4.68 (6.9 examples/sec; 18.481 sec/batch)
2015-11-17 02:14:49.068440: step 10, loss = 4.65 (562.6 examples/sec; 0.228 sec/batch)
2015-11-17 02:14:51.224980: step 20, loss = 4.65 (617.0 examples/sec; 0.207 sec/batch)
2015-11-17 02:14:53.375918: step 30, loss = 4.62 (664.1 examples/sec; 0.193 sec/batch)
2015-11-17 02:14:55.513463: step 40, loss = 4.60 (610.3 examples/sec; 0.210 sec/batch)
2015-11-17 02:14:57.696431: step 50, loss = 4.58 (615.1 examples/sec; 0.208 sec/batch)
2015-11-17 02:14:59.877955: step 60, loss = 4.57 (567.3 examples/sec; 0.226 sec/batch)
2015-11-17 02:15:02.101614: step 70, loss = 4.55 (621.1 examples/sec; 0.206 sec/batch)
2015-11-17 02:15:04.593141: step 80, loss = 4.52 (490.3 examples/sec; 0.261 sec/batch)
2015-11-17 02:15:06.983452: step 90, loss = 4.52 (641.4 examples/sec; 0.200 sec/batch)
2015-11-17 02:15:09.232584: step 100, loss = 4.50 (563.8 examples/sec; 0.227 sec/batch)
2015-11-17 02:15:11.783752: step 110, loss = 4.48 (538.0 examples/sec; 0.238 sec/batch)
2015-11-17 02:15:13.997070: step 120, loss = 4.46 (589.4 examples/sec; 0.217 sec/batch)
2015-11-17 02:15:16.458028: step 130, loss = 4.45 (467.8 examples/sec; 0.274 sec/batch)
2015-11-17 02:15:19.128071: step 140, loss = 4.42 (581.1 examples/sec; 0.220 sec/batch)
2015-11-17 02:15:21.491835: step 150, loss = 4.40 (568.2 examples/sec; 0.225 sec/batch)
2015-11-17 02:15:23.962043: step 160, loss = 4.39 (635.4 examples/sec; 0.201 sec/batch)
...

By the way, if you run it on the CPU version, it takes about twice as long to take one batch, so it seems that it is accelerated by using the GPU.

Build tutorials_example_trainer for GPU

Install bazel in a regular shell instead of a container environment

#Installation of required packages
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip

#DL bazel installer
$ wget https://github.com/bazelbuild/bazel/releases/download/0.1.1/bazel-0.1.1-installer-linux-x86_64.sh

#Installation
$ chmod +x bazel-0.1.1-installer-linux-x86_64.sh 
$ ./bazel-0.1.1-installer-linux-x86_64.sh --user

Edit ~ / .bashrc and add the following to the end

export PATH="$PATH:$HOME/bin"

Create a file in the path below ~/tensorflow/third_party/gpus/cuda/cuda.config

The contents of the file are as follows

CUDA_TOOLKIT_PATH="/usr/local/cuda"
CUDNN_INSTALL_PATH="/usr/local/cuda"

There should be a symbolic link / usr / local / cuda (=> / usr / local / cuda-7.0) when installing with default settings If you have changed the installation path, change it accordingly

Run ./configure before build

$ cd ~/tensorflow-GPU
$ source bin/activate
(tensorflow-GPU) $ cd ~/tensorflow
(tensorflow-GPU) $ ./configure
Do you wish to bulid TensorFlow with GPU support? [y/n] y
GPU support will be enabled for TensorFlow

Please specify the location where CUDA 7.0 toolkit is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Please specify the location where CUDNN 6.5 V2 library is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda

Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished

Build

(tensorflow-GPU) $ bazel build -c opt --config=cuda tensorflow/cc:tutorials_example_trainer

It took about 10 minutes to complete the build

Recommended Posts

I installed TensorFlow (GPU version) on Ubuntu
I installed tensorRT on Ubuntu 18.04
I installed ROS on Ubuntu 18.04
RTKLIB 2.4.3 CLI version GUI version installed on Ubuntu 18.04
Install TensorFlow on Ubuntu
[Ubuntu 18.04] Tensorflow 2.0.0-GPU environment construction
Introducing TensorFlow on Ubuntu + Python 2.7
63rd day I installed tensorflow.
Run GPU version tensorflow on AWS EC2 Spot Instances
Install Caffe on Ubuntu 14.04 (GPU)
Ubuntu14.04 + GPU + TensorFlow environment construction
I installed Ubuntu on a USB stick on a dual boot PC
I installed FreeCAD on Linux (Ubuntu) and created an icon
I stumbled on TensorFlow (What is Out of GPU Memory)
Install CUDA10.1 + cuDNN7.6.5 + tensorflow-2.3.0 on Ubuntu 18.04
Remove ubuntu installed on Windows 10 machine
[Linux] I installed CentOS on VirtualBox
I tried Cython on Ubuntu on VirtualBox
I couldn't input Japanese on Ubuntu 20.04
I installed Linux on my Mac
Run TensorFlow on a GPU instance on AWS
I stumbled upon installing sentencepiece on ubuntu
I installed Kivy on a Mac environment
Install Pleasant on Ubuntu 20.04 (.NetCore3.1 / PostgreSQL version)
I built a TensorFlow environment on windows10
I installed OpenCV-Python on my Raspberry Pi
I was addicted to running tensorflow on GPU with NVIDIA driver 440 + CUDA 10.2
June 2017 version to build Tensorflow / Keras environment on GPU instance of AWS
Try Tensorflow with a GPU instance on AWS
Shebang on Ubuntu 20.04
Use the latest version of PyCharm on Ubuntu
I tried object detection with YOLO v3 (TensorFlow 2.1) on the GPU of windows!
Notes for using TensorFlow on Bash on Ubuntu on Windows
Install Ubuntu 20.04 LTS (Server) 64bit version on RaspberryPi3B +
Run Tensorflow from Jupyter Notebook on Bash on Ubuntu on Windows
I got a UnicodeDecodeError when pip install on ubuntu
I want to use OpenJDK 11 on Ubuntu Linux 18.04 LTS / 18.10
Install Tensorflow on Mac
I built an environment for machine learning from scratch (windows10 + Anaconda + VSCode + Tensorflow + GPU version)
[Streamlit] I installed it
Install JModelica on Ubuntu
Enable GPU for tensorflow
I want to use the Ubuntu desktop environment on Android for the time being (Termux version)
Build TensorFlow on Windows
build Python on Ubuntu
Install Python 3.3 on Ubuntu 12.04
Installing pyenv on ubuntu 16.04
Install Theano on Ubuntu 12.04
Install angr on Ubuntu 18.04
I tried running TensorFlow
Install pip / pip3 on Ubuntu
I installed the automatic machine learning library auto-sklearn on centos7
I made a VGG16 model using TensorFlow (on the way)