GPUs on QUEST
Information about GPUs on QUEST
What GPUs are available on QUEST?
There are 16 GPU nodes that can be used with general access allocations. These nodes have the following GPUs:
- 8 12GB Tesla K40 (two GPUs on each node) (CPU info: 24 cores per node, 128 GB RAM)
- 40 12GB Tesla K80 (four GPUs on each node) (CPU info: 28 cores per node, 128 GB RAM)
- 2 16GB Tesla P100 (two GPUs on each node) (CPU info: 28 cores per node, 128 GB RAM)
- 2 16GB Tesla V100 (two GPUs on each node) (CPU info: 28 cores per node, 192 GB RAM)
The maximum run time is 48 hours for a job on these nodes. To submit jobs to general access GPU nodes, you should set gengpu as the partition and state the number of GPUs in your job submission command or script. You can also identify the type of GPU you want in your job submission. For instance to request one K40 GPU, you should add the following lines in your job submission script:
#SBATCH -A <allocationID> #SBATCH -p gengpu #SBATCH --gres=gpu:k40:1 #SBATCH -N 1 #SBATCH -n 1 #SBATCH -t 1:00:00 #SBATCH --mem=XXG
NOTE: The memory you request here is for CPU memory. You are auotmatically given access to the entrie memory of the GPU, but you will also need CPU memory as you will be copying memory from the CPU to the GPU.
To schedule another type of GPU i.e. K80, P100 or V100, you should change the k40 designation to one of k80, p100, or v100. If you want to start an interactive session on the K40 GPU instead of a batch submission, you can use a run command similar to the one below:
srun -A pXXXXX -p gengpu --mem=XX --gres=gpu:k40:1 -N 1 -n 1 -t 1:00:00 --pty bash -l
salloc -A pXXXXX -p gengpu --mem=XX --gres=gpu:k40:1 -N 1 -n 1 -t 1:00:00
What GPU software is available on QUEST?
CUDA
If you are simply interested in what versions of cuda are available on Quest, you can run this command:
module spider cuda
Tensorflow
We strongly encourage people to use anaconda to create virtual environments in order to use software that utilize GPUs, especially when using Python. Please see Using Python on QUEST for more information on anaconda virtual environments. These instructions will create a local anaconda virtual environment that will let users use the tensorflow-gpu backend with Keras. Please run the command that come after the $.
1. Load anaconda on QUEST
$ module load python-anaconda3
2. Create a virtual environment and install tensorflow and python into it. We are going to name our environment tensorflow-py37. On QUEST, by default, all anaconda environments go into a folder in your HOME directory
called ~/.conda/envs/. Therefore, once these steps are completed, all of the necessary packages will live in a folder whose PATH is ~/.conda/envs/tensorflow-py37
$ conda create --name tensorflow-py37 -c anaconda tensorflow-gpu python=3.7 --yes
3. Install keras into the virtual environment
$ conda install --name tensorflow-py37 -c conda-forge keras --yes
4. Activate virtual environment
Docker/Singularity/Container
Please note that these containers only work for P100 GPUs or newer. These are not suitable for K40s or K80s.
NVIDIA provides a whole host of GPU containers, you can find them all here: https://ngc.nvidia.com/catalog/containers/
We demonstrate how you can use singularity to pull and then use one of these containers to run Tensorflow. See a list of NVIDIA tensorflow containers here.
module purge all
module load singularity
singularity pull docker://nvcr.io/nvidia/tensorflow:20.12-tf2-py3
We can then use the command below to call this NVIDIA GPU container to run a simple tensorflow training example.
singularity exec --nv -B /projects:/projects tensorflow_20.12-tf2-py3.sif python training.py