Dynamic GPU usage monitoring (CUDA)

To dynamically  monitor NVIDIA GPU usage, here I introduce two methods:

method 1: use nvidia-smi

in your terminal, issue the following command:

$ watch -n 1 nvidia-smi

It will continually update the gpu usage info (every second, you can change the 1 to 2 or the time interval you want the usage info to be updated).

method 2: use the open source monitoring program glances with its GPU monitoring plugin

in your terminal, issue the following command to install glances with its GPU monitoring plugin

$ sudo pip install glances[gpu]

to launch it, in your terminal, issue the following command:

 $ sudo glances

Then you should see your GPU usage etc. It also monitors the CPU, disk IO, disk space, network, and a few other things

For more commonly used Linux commands, check my other posts at here  and here .

Install Keras with GPU TensorFlow as backend on Ubuntu 16.04

This post introduces how to install Keras with TensorFlow as backend on Ubuntu Server 16.04 LTS with CUDA 8 and a NVIDIA TITAN X (Pascal) GPU, but it should work for Ubuntu Desktop 16.04 LTS.

We gratefully acknowledge the support of NVIDIA Corporation with awarding one Titan X Pascal GPU used for our machine learning and deep learning based research.

Keras is a great choice to learn machine learning and deep learning. Keras has easy syntax and can use Google TensorFlow or Microsoft CNTK or Theano as its backend.  Keras is simply a wrapper around more complex numerical computation engines such as TensorFlow and Theano.

Keras abstracts away much of the complexity of building a deep neural network, leaving us with a very simple, nice, and easy to use interface to rapidly build, test, and deploy deep learning architectures.

TensorFlow is extremely flexible, allowing you to deploy network computation to multiple CPUs, GPUs, servers, or even mobile systems without having to change a single line of code.

This makes TensorFlow an excellent choice for training distributed deep learning networks in an architecture agnostic way.

Now Let’s start on the installation of Keras with TensorFlow as its backend.

1: Setup Python virtual environment

Check my post about more details about how to setup python virtual environment and why it is better to install python libraries in Python virtual environment.

  • Install pip and Virtualenv for python and python 3:
$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev
  • Create a Virtualenv environment in the directory for python and python 3:
#for python 2
virtualenv --system-site-packages -p python ~/keras-tf-venv

# for python 3 
virtualenv --system-site-packages -p python3 ~/keras-tf-venv3

(Note: To delete a virtual environment, just delete its folder.  For example, In our cases, it would be rm -rf keras-tf-venv or rm -rf keras-tf-venv3.)

2: Update & Install NVIDIA Drivers (skip this if you do not need to TensorFlow GPU version)

Check another post I wrote(steps 1-4 in that post) for detailed instructions about how to update and install NVIDIA Drive and CUDA 8.0 and cuDNN for the requirements of GPU TensorFlow installation.

Notes: If you have old version of NVIDIA driver installed used the following to remove it first before installation of new driver.

Step 1: Remove older version of NVIDIA
sudo apt-get purge nvidia*

Step 2: Reboot the system

test whesther it is removed

$ sudo nvidia-smi 
$ sudo: nvidia-smi: command not found  # this means the old driver was uninstalled.

(Note: If you have older version of CUDA and cuDNN installed, check the post for uninstallation.  How to uninstall CUDA Toolkit and cuDNN under Linux? (02/16/2017) (pdf). Actually to uninstall (older version) of CUDA, it tells you how to uninstall it when you install, see the Install cuda 8.0 below.)

(Note: I tried to install the latest Nvidia drive, latest cuda and latest cudnn (i.e., v6.0), but it did not work for me when I installed TensorFlow. After a few testing, I found when I install Nvidia drive 375.82,  cuda_8.0.61_375.26_linux.run, cudnn-8.0-linux-x64-v5.1.tgz. it works.)

Install cuda 8.0:

Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/liping, but missing recommended libraries

Please make sure that
– PATH includes /usr/local/cuda-8.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_3813.log

 

3: Install TensorFlow

Before installing TensorFlow and Keras, be sure to activate your python virtual environment first.

# for python 2
$ source ~/keras-tf-venv/bin/activate  # If using bash
(keras-tf-venv)$  # Your prompt should change

# for python 3
$ source ~/keras-tf-venv3/bin/activate  # If using bash
(keras-tf-venv3)$  # Your prompt should change

 (keras-tf-venv)$ pip install --upgrade tensorflow   # Python 2.7; CPU support (no GPU support)
 (keras-tf-venv3)$ pip3 install --upgrade tensorflow   # Python 3.n; CPU support (no GPU support)
 (keras-tf-venv)$ pip install --upgrade tensorflow-gpu  # Python 2.7;  GPU support
 (keras-tf-venv3)$ pip3 install --upgrade tensorflow-gpu # Python 3.n; GPU support

Note: If the commands for installing TensorFlow given above failed (typically because you invoked a pip version lower than 8.1), install TensorFlow in the active virtualenv environment by issuing a command of the following format:

 (keras-tf-venv)$ pip install --upgrade TF_PYTHON_URL   # Python 2.7
 (keras-tf-venv3)$ pip3 install --upgrade TF_PYTHON_URL  # Python 3.N

where TF_PYTHON_URL identifies the URL of the TensorFlow Python package. The appropriate value of TF_PYTHON_URLdepends on the operating system, Python version, and GPU support. Find the appropriate value for TF_PYTHON_URL for your system here. For example, if you are installing TensorFlow for Linux, Python 2.7, and CPU-only support, issue the following command to install TensorFlow in the active virtualenv environment: (see below for examples. Note that check here to get the latest version for your system.)

#for python 2.7 -- CPU only
(keras-tf-venv)$pip install --upgrade \ https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none-linux_x86_64.whl

#for python 2.7 -- GPU support
(keras-tf-venv)$pip install --upgrade \ https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl

# for python 3.5 -- CPU only
(keras-tf-venv3)$ pip3 install --upgrade \
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp35-cp35m-linux_x86_64.whl

# for python 3.5 -- GPU support
(keras-tf-venv3)$ pip3 install --upgrade \ 
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl
  • Validate your TensorFlow installation. (as I just installed GPU tensorflow, so if you install CPU TensorFlow, the output might be slightly different.)

#For Python 2.7

(keras-tf-venv) :~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-08-01 14:28:31.257054: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 14:28:31.257090: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 14:28:31.257103: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 14:28:31.257114: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 14:28:31.257128: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 14:28:32.253475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:03:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
2017-08-01 14:28:32.253512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-01 14:28:32.253519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 
2017-08-01 14:28:32.253533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0)
>>> print(sess.run(hello))
Hello, TensorFlow!
>>> exit()
(keras-tf-venv) :~$ 

#for python 3

(keras-tf-venv3) :~$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-08-01 13:54:30.458376: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 13:54:30.458413: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 13:54:30.458425: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 13:54:30.458436: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 13:54:30.458448: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 13:54:31.420661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:03:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
2017-08-01 13:54:31.420692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-01 13:54:31.420699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 
2017-08-01 13:54:31.420712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
>>> exit() 
(keras-tf-venv3) :~$

If you see the output as below, it indicates your TensorFlow was installed correctly.

Hello, TensorFlow!

3: Install Keras

(Note: Be sure that you activated your python virtual environment before you install Keras.)

Installing Keras is even easier than installing TensorFlow.

First, let’s install a few dependencies:

#for python 2
$ pip install numpy scipy
$ pip install scikit-learn
$ pip install pillow
$ pip install h5py

#for python 3
$ pip3 install numpy scipy
$ pip3 install scikit-learn
$ pip3 install pillow
$ pip3 install h5py

4: Verify that your keras.json file is configured correctly

Let’s now check the contents of our keras.json  configuration file. You can find this file at ~/.keras/keras.json .

use nano to open and edit the file.

$ nano ~/.keras/keras.json

The default values should be something like this:

{
 "epsilon": 1e-07,
 "backend": "tensorflow",
 "floatx": "float32",
 "image_data_format": "channels_last"
}

Can’t find your keras.json file?

On most systems the keras.json  file (and associated subdirectories) will not be created until you open up a Python shell and directly import the keras  package itself.

If you find that the ~/.keras/keras.json  file does not exist on your system, simply open up a shell, (optionally) access your Python virtual environment (if you are using virtual environments), and then import Keras:

#for python 2
$ python
>>> import keras
>>> quit()

#for python 3
$ python3
>>> import keras
>>> quit()

From there, you should see that your keras.json  file now exists on your local disk.

If you see any errors when importing keras  go back to the top of step 4 and ensure your keras.json  configuration file has been properly updated.

5: Test Keras + TensorFlow installation

To verify that Keras + TensorFlow have been installed, simply access the keras_tf  environment using the workon  command, open up a Python shell, and import keras :

Specifically, you can see the text Using TensorFlow backend  display when importing Keras — this successfully demonstrates that Keras has been installed with the TensorFlow backend.

Note: each time you would like to use Keras, you need to activate the virtual environment into which it installed, and when you are done using Keras, deactivate the environment.

# for python 2
(keras-tf-venv)$ deactivate
$  # Your prompt should change back

#for python 3
(keras-tf-venv3)$ deactivate
$  # Your prompt should change back

Note: To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf keras-tf-venv or rm -rf keras-tf-venv3.)

References:

Installing Keras with TensorFlow backend (by on November 14, 2016 in Deep Learning, Libraries, Tutorials)

Installing keras makes tensorflow can’t find GPU

Installing Nvidia, Cuda, CuDNN, TensorFlow and Keras

https://www.tensorflow.org/install/install_linux

Keras as a simplified interface to TensorFlow: tutorial

I: Calling Keras layers on TensorFlow tensors

II: Using Keras models with TensorFlow

III: Multi-GPU and distributed training

IV: Exporting a model with TensorFlow-serving

How to install a graphics card

This post introduces how to install a graphics card and related resources.

Notes: When installing or removing a graphics card, handle it only by its edges, not by its connectors or components.

Before starting with graphics card installation, check out the following two things.

Step 1: make sure your computer has the proper hardware to support your new card.

The most common problem that people run into is an inadequate power supply:

  • Either it can’t supply enough wattage,
  • or it doesn’t have enough available PCI-E power connectors.

As a rule of thumb, your power supply should be rated from double the power consumption of your graphics card. For example if you purchased a R9 290X—a video card that draws 300 watts—you should have a power supply that can provide at least 600 watts of power and has both 8-pin and 6-pin PCI-E power connectors.

To find out how much wattage your power supply pumps out, open your case and look for the standard identification sticker all power supplies have, which lists their basic info. While you’re there you can also identify how many 6-pin and 8-pin PCI-E connectors are available.

Picking the right power supply is even more important if you’re upgrading to a multi-card configuration, because you’ll likely need to buy a power supply rated for one or more kilowatts. For more on multi-GPU SLI and CrossFireX setups, check out PCWorld’s guide to tricking out your PC with multiple graphics cards.

Step 2: is there enough room inside of your case to fit your new graphics card? 

Some high-end graphics card can be over a foot long, and two or even three expansion slots wide. You can find the physical dimensions for a graphics on its product page or on the manufacturer’s website.

With all of those questions resolved, it’s time to install the new card.

Installing a graphics card

Installing a graphics card requires three things:

  • a new graphics card,
  • your computer, and
  • a Phillips-head screw driver.

Notes: Be sure to turn off your PC and unplug it from the wall before you begin.

Start by removing the side of your computer’s case, then locate and remove your current graphics card. Some PCs will not have a graphics card installed. Instead, you need to locate the PCI-E x16 slot closest to the heat sink of your processor. This will either be the first or second expansion slot on your motherboard.

You install a graphics card into a PCI-E x16 slot on your computer’s motherboard (the long, blue slots in this picture given below.)

gpu install 1 of 7(By )

Make sure that there are no loose wires blocking your access to this slot. If you’re replacing an existing graphics card, unplug any cables connected to it, remove the screw from its retention bracket, and then remove the card. Most motherboards also have a small plastic latch on the end of the PCI-E slot that locks the graphics card in place. Make sure you toggle this latch to unlock your old graphics card so you can remove it.

You can now install your new graphics card into the open and unobstructed PCI-E x16 slot. Firmly insert the card into the slot, then push down the plastic lock on the end of the PCI-E slot to hold it in place. Next, use a screw to secure the graphic card’s metal retention bracket to your PC’s case. You can reuse the same screw(s) that held the cover bracket or your former graphics card in place.

Notes: Don’t forget to lock the latch at the end of the PCI-E slot after firmly inserting your graphics card! (This is very important, because your expensive graphics card needs this to secure its positioning. You should hear a “ta” sound when the latch is locked. It should be very firm and steadily locked, if you feel it is loose, that indicates the latch is not locked properly.)

Most gaming-level graphics cards require additional power connectors. If yours does, make sure you connect those PCI-E power cables. Your graphics card will not function correctly without properly supplied power. In fact, if you don’t connect those PCI-E power cables your PC may be unable to boot.

 

Wrapping up

With your graphics card secured and powered up, finish the job by sliding your case’s side panel back into position and plugging your display cable into your new graphics card. Turn on your computer.

Now it’s time to take care of the software side of upgrading your graphics card.

If your new graphics card is the same brand as your old card, this process is simple. Just go to the manufacturer’s website and download the latest driver package for you operating system. Keep in mind that graphics drivers are quite large, generally about 300MB in size, and it may take some time to download depending on the speed of your Internet connection. Install the driver, restart your computer.

If you’re switching manufacturers (from Intel to AMD, from AMD to Nvidia, or vice-versa), uninstall your old graphics driver and restart your computer before installing the driver for your new graphics card. If you don’t uninstall the old driver it may conflict with the new driver.

References:

Tips

  • If you intend to use more than one monitor with your video cards, be aware that while SLI is enabled, only one monitor is supported. A workaround for this involves installing additional hardware.
  • With Nvidia’s SLI, at least for the time being, you must connect two video cards with the same chipset. Example, 1 bfg 7600 gt and 1 evga 7600 gt can be connected.

Warnings

  • Make sure to ground yourself before handling hardware as ESD (electrostatic discharge) can fry your components. Static electricity remains a threat to all computer components. It is recommended that you wear clothing that does not produce static charges, that you keep in near constant contact with the computer’s chassis, and that you avoid touching the metallic traces of circuit boards inside or out of the computer.
  • Always unplug your system before installing any hardware

Choose proper GeForce GPU(s) according to your machine

This post introduces how to choose proper NVIDIA GeForce GPU(s) according to your desktop or workstation.

We gratefully acknowledge the support of NVIDIA Corporation with the donation of (1) Titan X Pascal GPU used for our machine learning and deep learning based research.

It is very important to choose the proper GPUs according to your Desktop / Workstation (The Power Specs of your machine that will house), and also according to the overall computation performance efficiency, including the GPU Engine Specs (esp. how many NVIDIA CUDA Cores) and Memory Specs (e.g., Memory Speed, Standard Memory Config, Memory Bandwidth (GB/sec))GPU(s)) , as well as financial cost.

  • Full Specifications (The Compute Capability of the four GPU graphics cards listed below are all 6.1.)

NVIDIA TITAN Xp (VIEW FULL SPECS)

NVIDIA TITAN X (VIEW FULL SPECS)

GeForce GTX 1080 Ti

GeForce GTX 1080 (VIEW FULL SPECS)

  • Price

NVIDIA TITAN Xp

NVIDIA TITAN X

GeForce GTX 1080 Ti

GeForce GTX 1080

When you choose GeForce GPU(s) for your machine, be sure to consider both the power specs of your machine and also the GPU Engine Specs (esp. how many NVIDIA CUDA Cores) and Memory Specs (e.g., Memory Speed, Standard Memory Config, Memory Bandwidth (GB/sec)).

For example, if your machine has one 8pin and two 6pin PCIe power cables, and you have budge around $1200, I would recommend go for two GeForce GTX 1080 cards. In this case, purchasing two GeForce GTX 1080 cards will cost you a little bit less and more importantly it will give you much more computation power comparing with one single NVIDIA TITAN Xp

(Note that two 6pin PCIe power cables can be used as one 8pin PCIe power cable.)

If you machine has one 8pin and one 6pin if you have $700 budget, go for  GeForce GTX 1080 Ti

If you have two 6 pins or one 8pin, or one 8pin and  one 6pin, and you have budge around $600, the best choice would be one GeForce GTX 1080.

In this post I just compared the GPU card above GeForce GTX 1080. For more (combination) options, check the table I given below to find the best configuration according to your machine and the cost that best suitable for you.

(Thanks for Scott and Bob’s help with this.)

References:

GPU Compute Capability
NVIDIA TITAN Xp 6.1
NVIDIA TITAN X 6.1
GeForce GTX 1080 Ti 6.1
GeForce GTX 1080 6.1
GeForce GTX 1070 6.1
GeForce GTX 1060 6.1
GeForce GTX 1050 6.1
GeForce GTX TITAN X 5.2
GeForce GTX TITAN Z 3.5
GeForce GTX TITAN Black 3.5
GeForce GTX TITAN 3.5
GeForce GTX 980 Ti 5.2
GeForce GTX 980 5.2
GeForce GTX 970 5.2
GeForce GTX 960 5.2
GeForce GTX 950 5.2
GeForce GTX 780 Ti 3.5
GeForce GTX 780 3.5
GeForce GTX 770 3.0
GeForce GTX 760 3.0
GeForce GTX 750 Ti 5.0
GeForce GTX 750 5.0
GeForce GTX 690 3.0
GeForce GTX 680 3.0
GeForce GTX 670 3.0
GeForce GTX 660 Ti 3.0
GeForce GTX 660 3.0
GeForce GTX 650 Ti BOOST 3.0
GeForce GTX 650 Ti 3.0
GeForce GTX 650 3.0
GeForce GTX 560 Ti 2.1
GeForce GTX 550 Ti 2.1
GeForce GTX 460 2.1
GeForce GTS 450 2.1
GeForce GTS 450* 2.1
GeForce GTX 590 2.0
GeForce GTX 580 2.0
GeForce GTX 570 2.0
GeForce GTX 480 2.0
GeForce GTX 470 2.0
GeForce GTX 465 2.0
GeForce GT 740 3.0
GeForce GT 730 3.5
GeForce GT 730 DDR3,128bit 2.1
GeForce GT 720 3.5
GeForce GT 705* 3.5
GeForce GT 640 (GDDR5) 3.5
GeForce GT 640 (GDDR3) 2.1
GeForce GT 630 2.1
GeForce GT 620 2.1
GeForce GT 610 2.1
GeForce GT 520 2.1
GeForce GT 440 2.1
GeForce GT 440* 2.1
GeForce GT 430 2.1
GeForce GT 430* 2.1

NVIDIA TITAN X Pascal vs GTX 1080

This post introduces NVIDIA TITAN X Pascal, GTX 1080, and the comparisons between them.

In order to use TensorFlow with GPU support you must have a NVIDIA graphic card with a minimum compute capability of 3.0.

A single NVIDIA TITAN X Pascal is apparently much more powerful than a GTX 1080 graphics card if we do not consider their cost. But two GTX 1080 GPU cards will outperform a single NVIDIA Ttian X Pascal and from the financial cost perspective, two GTX 1080 will save your some money as well comparing to purchasing a single NVIDIA TITAN X. See my post, choose proper GeForce GPU(s) according to your machine,  for some detailed explanations.

References:

  • NVIDIA Titan X – The fastest accelerator for deep neural network training on a desktop PC based on the revolutionary NVIDIA Pascal architecture

Quadro vs GeForce GPUs for training neural networks

If you’re choosing between Quadro and GeForce, definitely pick GeForce. If you’re choosing between Tesla and GeForce, pick GeForce, unless you have a lot of money and could really use the extra RAM.

Quadro GPUs aren’t for scientific computation, Tesla GPUs are. Quadro cards are designed for accelerating CAD, so they won’t help you to train neural nets. They can probably be used for that purpose just fine, but it’s a waste of money.

Tesla cards are for scientific computation, but they tend to be pretty expensive. The good news is that many of the features offered by Tesla cards over GeForce cards are not necessary to train neural networks.

For example, Tesla cards usually have ECC memory, which is nice to have but not a requirement. They also have much better support for double precision computations, but single precision is plenty for neural network training, and they perform about the same as GeForce cards for that.

One useful feature of Tesla cards is that they tend to have is a lot more RAM than comparable GeForce cards. More RAM is always welcome if you’re planning to train bigger models (or use RAM-intensive computations like FFT-based convolutions).

See here for CUDA GPUs on NVIDA Website.

References:

 

Install GPU TensorFlow from Source on Ubuntu Server 16.04 LTS

I installed GPU TensorFlow from source on Ubuntu Server 16.04 LTS with CUDA 8 and a GeForce GTX 1080 GPU, but it should work for Ubuntu Desktop 16.04 LTS.

In this tutorial I will be going through the process of building the latest TensorFlow from sources for Ubuntu Server 16.04.  TensorFlow now supports using Cuda 8.0 & CuDNN 5.1 so you can use the pip’s from their website for a much easier install.

In order to use TensorFlow with GPU support you must have a NVIDIA graphic card with a minimum compute capability of 3.0.

Getting started I am going to assume you know some of the basics of using a terminal in Linux. (Check this post for commonly used Linux commands.)

1: Install Required Packages

Open a terminal by pressing Ctrl + Alt + T.

(Because it is Ubuntu Server 16.04, need to install those required packages below, if you are on Ubuntu Desktop 16.04, most of the libraries below already come with the OS installation.)

Paste each line one at a time (without the $) using Shift + Ctrl + V

$ sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev

2: Update & Install NVIDIA Drivers

Note that if you have a monitor connected to your server, be sure to disconnect it before you start to install the NVIDIA drivers. Otherwise, it may cause trouble when you reboot your server after you install your NVIDIA drivers. You can reconnect your monitor after you successfully install the NVIDIA drivers.

You must also have the 367 (or later) NVidia drivers installed, this can easily be done from Ubuntu’s built in additional drivers after you update your driver packages. (you can check the latest drivers version according to your GPU info from The NVIDIA downloads page, for example, mine is 375.)

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ sudo apt-get install nvidia-375  

(Note: use the following command if you encounter this error “sudo: add-apt-repository: command not found”)

$ sudo apt-get install software-properties-common

Once installed the driver restart your computer. You can use the command below to reboot the server from command line.

$ sudo reboot -h now

If you experience any troubles booting linux or logging in: try disabling fast & safe boot in your bios and modifying your grub boot options to enable nomodeset.

You can use the following command to get various diagnostics of the GTX 1080.

$ sudo nvidia-smi

 

3: Install NVIDIA CUDA Toolkit 8.0 

Skip if not installing with GPU support

(Note: If you have older version of CUDA and cuDNN installed, check the post for uninstallation.  How to uninstall CUDA Toolkit and cuDNN under Linux? (02/16/2017) (pdf))

(If you need to use command line to transfer files from your clienet computer to your server. refer to the following scp command)

File Transfer: getting files to/from  your Ubuntu server

copy file:

scp -p file_name username@yourserver_hostname:destination/directory

for a full directory tree:

scp -pr dir_name username@yourserver_hostname:destination/directory

 

To install the Nvidia Toolkit  download base installation .run file from Nvidia website (download the .run file. NOT THE DEB FILE!!).

 

$ cd ~/Downloads # or directory to where you downloaded file
$ sudo sh cuda_8.0.44_linux.run  # hold s to skip

This will install cuda into: /usr/local/cuda-8.0

MAKE SURE YOU SAY NO TO INSTALLING NVIDIA DRIVERS! (Very important, If you answer yes, the GTX 1080 375 driver will be overwritten.

Also make sure you select yes to creating a symbolic link to your cuda directory.

(FYI, the following is the questions to be asked.)

The following contains specific license terms and conditions
for four separate NVIDIA products. By accepting this
agreement, you agree to comply with all the terms and
conditions applicable to the specific product(s) included
herein.

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 361.62?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/liping ]:

Installing the CUDA Toolkit in /usr/local/cuda-8.0 …
Installing the CUDA Samples in /home/liping …
Copying samples to /home/liping/NVIDIA_CUDA-8.0_Samples now…
Finished copying samples.

 

= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /home/liping, but missing recommended libraries

Please make sure that
 –   PATH includes /usr/local/cuda-8.0/bin
 –   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_7169.log

 

4: Install NVIDIA cuDNN

Once the CUDA Toolkit is installed, download cuDNN v5.1 for Cuda 8.0 from NVIDIA website (Note that you will be asked to register an NVIDIA developer account in order to download) and extract into /usr/local/cuda via:

$ sudo tar -xzvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Then update your bash file:

$ nano ~/.bashrc

This will open your bash file in a text editor which you will scroll to the bottom and add these lines:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

Once you save and close the text file you can return to your original terminal and type this command to reload your .bashrc file:

$ source ~/.bashrc

5: Install Bazel

Instructions also on Bazel website

$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install bazel
$ sudo apt-get upgrade bazel

6: Clone TensorFlow

$ cd ~
$ git clone https://github.com/tensorflow/tensorflow

7: Configure TensorFlow Installation

$ cd ~/tensorflow
$ ./configure

Use defaults by pressing enter for all except:

Please specify the location of python. [Default is /usr/bin/python]:

For Python 2 use default or If you wish to build for Python 3 enter:

$ /usr/bin/python3.5

Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]:

For Python 2 use default or If you wish to build for Python 3 enter:

$ /usr/local/lib/python3.5/dist-packages

Unless you have a Radeon graphic card you can say no to OpenCL support. (has anyone tested this? ping me if so!)

Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:

$ 8.0

Please specify the Cudnn version you want to use. [Leave empty to use system default]:

$ 5

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: “3.5,5.2”]: 5.2,6.1
……….
INFO: Starting clean (this may take a while). Consider using –expunge_async if the clean takes more than several minutes.
………
INFO: All external dependencies fetched successfully.
Configuration finished

If all was done correctly you should see:

INFO: All external dependencies fetched successfully.
Configuration finished.

8: Build TensorFlow

Warning Resource Intensive I recommend having at least 8GB of computer memory.

(Note that you current path in terminal is ~/tensorflow) 

If you want to build TensorFlow with GPU support enter (Note that the command should be one line):

$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

For CPU only enter:

$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

9:Build & Install Pip Package

(Note that you current path in terminal is ~/tensorflow) 

This will build the pip package required for installing TensorFlow in your ~/tensorflow_pkg [you can change this directory as the one you like]

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg

Remember that, at any time, you can manually force the project to be reconfigured (run the ./configure file in step 7 above to reconfigure) and built from scratch by emptying the directory ~/tensorflow_pkg  with:

rm -rf ./*

Now you can cd into the directory where you build your tensorflow, for example my case is  ~/tensorflow_pkg

then issue the following command according to you are using python or python 3.

To Install Using Python 3 (remove sudo if using a virtualenv)

$ sudo pip3 install tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl

# tip: after you type tensorflow, you can hit Tab on your keyboard to autofill the name of the .whl file you just built

For Python 2 (remove sudo if using a virtualenv)

$ sudo pip install tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl

# tip: after you type tensorflow, you can hit Tab on your keyboard to autofill the name of the .whl file you just built

Note that if you meet this error:

The directory ‘/home/youraccountname/.cache/pip/http’ or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo’s -H flag. 

Change the command above to

 sudo -H pip install tensorflow-0.12.1-cp27-cp27mu-linux_x86_64.whl

If you meet this warning

You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the ‘pip install –upgrade pip’ command.

I would suggest just ignore this – sometimes after doing upgrade there might appear some trouble because of dependencies.

10: Test Your Installation

Finally, time to test our installation.

To test the installation, open an interactive Python shell and import the TensorFlow module:

$ cd # this will return to your home root directory ~
$ python  # or python3
… 
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally

With the TensorFlow module imported, the next step to test the installation is to create a TensorFlow Session, which will initialize the available computing devices and provide a means of executing computation graphs:

>>> sess = tf.Session()
>>> sess = tf.Session() 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
…

To manually control which devices are visible to TensorFlow, set the CUDA_VISIBLE_DEVICES environment variable when launching Python. For example, to force the use of only GPU 0:

$ CUDA_VISIBLE_DEVICES=0 python

You should now be able to run a Hello World application:

>>> hello_world = tf.constant("Hello, TensorFlow!") 
>>> print sess.run(hello_world) 
Hello, TensorFlow! 
>>> print sess.run(tf.constant(12)*tf.constant(3)) 
36 

TensorFlow also has instructions on how to do a basic test and a list of common installation problems.

You should now have TensorFlow installed on your computer. This tutorial was tested on a fresh install of Ubuntu Server 16.04 with a GeForce GTX 1080.

 

Referenced posts (See this page for more TensorFlow setup links I collected):