GPUs on QUEST

Information about GPUs on QUEST

What GPUs are available on QUEST?

There are 18 GPU nodes available to the Quest General Access allocations. These nodes run CUDA version 10.1 and Driver Version 418.39:

  • 8 12GB Tesla K40 (two GPUs on each node) (CPU info: 24 cores per node, 128 GB RAM)
  • 40 12GB Tesla K80 (four GPUs on each node) (CPU info: 28 cores per node, 128 GB RAM)
  • 2 16GB Tesla P100 (two GPUs on each node) (CPU info: 28 cores per node, 128 GB RAM)
  • 2 16GB Tesla V100 (two GPUs on each node) (CPU info: 28 cores per node, 192 GB RAM)
There are 2 GPU nodes in the Genomics Compute Cluster (b1042). These nodes run CUDA version 11.2  and Driver Version 460.32.03:
  • 2 40GB Tesla A100 (four GPUs on each node) (CPU info: 52 cores per node, 192 GB RAM)

Using General Access GPUs

The maximum run time is 48 hours for a job on these nodes. To submit jobs to general access GPU nodes, you should set gengpu as the partition and state the number of GPUs in your job submission command or script. You can also identify the type of GPU you want in your job submission. For instance to request one P100 GPU, you should add the following lines in your job submission script:

#SBATCH -A <allocationID>
#SBATCH -p gengpu
#SBATCH --gres=gpu:p100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Note that the memory you request here is for CPU memory. You are automatically given access to the entire memory of the GPU, but you will also need CPU memory as you will be copying memory from the CPU to the GPU.

To schedule another type of GPU, e.g. V100, you should change the p100 designation to the other GPU type, e.g. v100.

Using Genomics Compute Cluster GPUs

The maximum run time is 48 hours for a job on these nodes. Feinberg members of the Genomics Compute Cluster  should use the partition genomics-gpu, while non-Feinberg members should use genomicsguest-gpu. To submit a job to these GPUs, include the appropriate partition name and specify the type and number of GPUs:

#SBATCH -A b1042
#SBATCH -p genomics-gpu
#SBATCH --gres=gpu:a100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Note that the memory you request here is for CPU memory. You are automatically given access to the entire memory of the GPU, but you will also need CPU memory as you will be copying memory from the CPU to the GPU.

Interactive GPU jobs

If you want to start an interactive session on a GPU instead of a batch submission, you can use a run command similar to the one below - these examples both request a P100:

srun -A pXXXXX -p gengpu --mem=XX --gres=gpu:p100:1 -N 1 -n 1 -t 1:00:00 --pty bash -l

salloc -A pXXXXX -p gengpu --mem=XX --gres=gpu:p100:1 -N 1 -n 1 -t 1:00:00

What GPU software is available on QUEST?

CUDA

To see which versions of CUDA are available on Quest, run the command:

module spider cuda

We recommend loading the  CUDA module appropriate for the GPU you're using, listed at the top of this page.

Anaconda

We strongly encourage people to use anaconda to create virtual environments in order to use software that utilize GPUs, especially when using Python. Please see Using Python on QUEST for more information on anaconda virtual environments. These instructions will create a local anaconda virtual environment that will install the TensorFlow-gpu package. Please run the command that come after the $.

1. Load anaconda on QUEST

$ module load python-anaconda3

2. Create a virtual environment and install TensorFlow and Python into it. We are going to name our environment tensorflow-py37. On QUEST, by default, all anaconda environments go into a folder in your HOME directory
called ~/.conda/envs/. Therefore, once these steps are completed, all of the necessary packages will live in a folder whose PATH is ~/.conda/envs/tensorflow-py37

$ conda create --name tensorflow-py37 -c anaconda tensorflow-gpu python=3.7 --yes

3. Install keras into the virtual environment

$ conda install --name tensorflow-py37 -c conda-forge keras --yes

4. Activate virtual environment

$ source (conda) activate tensorflow-py37

Singularity Container

Note that these containers only work for P100 GPUs or newer and are not suitable for K40 or K80 GPUs.

NVIDIA provides a whole host of GPU containers that are suitable for different applications. Docker images cannot be used directly on Quest due to security risks, but they can be pulled to generate Singularity containers. Below we provide an example of using Singularity to pull the NVIDIA tensorflow Docker image. For most NVIDIA containers, there are many different versions which come with specific versions of the relevant libraries and packages. See NVIDIA's TensorFlow documentations for further information about the version of tensorflow that is shipped with each version of the tensorflow Docker container.

module purge all

module load singularity

singularity pull docker://nvcr.io/nvidia/tensorflow:21.02-tf2-py3

We can then use the command below to call this NVIDIA GPU container to run a simple TensorFlow training example.

singularity exec --nv -B /projects:/projects tensorflow_21.02-tf2-py3.sif python training.py

A key difference between calling a non-GPU container versus a GPU container is passing the --nv argument to the exec command. A reminder that -B /projects:/projects mounts the projects folder into the singularity environment. By default, /projects is not mounted or discoverable by the container.  Please see using containers on Quest for more information on containers in general.




Keywords:gpu,quest,python,research computing,tensorflow   Doc ID:108515
Owner:Research Computing .Group:Northwestern
Created:2021-01-21 17:18 CDTUpdated:2021-03-16 17:26 CDT
Sites:Northwestern
Feedback:  0   0