This page contains frequently asked questions about the Quest high performance computing cluster at Northwestern.
Allocations And Accounts
How do I get an account on Quest?
To access Quest, you will need to have an active research allocation. There are two ways to obtain an allocation:
i- Submit a new allocation request form for either a Research I or Research II allocation.
ii- Join an existing research allocation.
The application forms can be found here: Request Research Allocation Forms
The Research Allocation I is suitable for projects requiring 35,000 compute hours or less. This allocation provides a 500 GB project directory. The resources provided by Research I will fit the needs of the majority of our users. A Research II allocation is suitable for projects requiring up to 500,000 compute hours. This allocation provides a 2 TB project storage. Research II allocations are for projects with a large computational need, however you will need to submit a detailed proposal. Both Research I and II allocations are available free of charge, but we do request that a chartstring is provided for your research if possible. This chartstring would be used for internal tracking purposes so that we can see the impact Quest is having on research done on campus.
How do I use GPUs on Quest?
General access allocations (i.e. Research I, Research II and Education) have access to 14 compute nodes with GPUs. There are 8 NVIDIA Tesla K40 GPUs (two on each node) and 40 Tesla K80 GPUs (four on each node). The maximum run time is 48 hours for a job on these nodes. To submit jobs to GPU nodes, you should set gengpu as the partition and state the number of GPUs in your job submission command or script. You can also identify the type of GPU you want in your job submission. For instance to request one K40 GPU, you should add the following lines in your job submission script:
#SBATCH -p gengpu #SBATCH --gres=gpu:k40:1If you want to start an interactive session on the same GPU instead of a batch submission, you can use an run command similar to the one below:
srun -A <allocationID> -p gengpu -N 1 -n 1 -t 1:00:00 --gres=gpu:k40:1 --pty bash -lOnce you have submitted the form and have been approved, you can specify gengpu as the allocation ID and buyin for the queue type in your submission script to run jobs on the GPU nodes. When defining computing resources needed, you will need to also add gpus flags as in nodes=1:ppn=1:gpus=1
How do I join an existing allocation on Quest?
In order to get an account on quest, you must first fill out an application
form. After this application form is filled out, we will contact the CI of
that allocation, and they will either approve or deny your membership in
that allocation. If you would like to join multiple allocations, you will
need to submit separate applications for each allocation.
The link to join an existing allocation can be found here: Request Research Allocation Forms
After you have filled out this application, we will contact the CI for this allocation, generally within one business day after the receipt of the form.
How do I transfer files to and from Box on Quest?
The best way to transfer data between Quest storage and Box would be to connect to Quest with the FastX client and start a Gnome desktop session or a Gnome terminal session. Then launch a terminal (or in the terminal that is already launched, if you pick a Gnome terminal session) and type
module load firefox firefoxto launch the firefox browser. In that browser you can log into Box and transfer files.
Documentation on how to use FastX to connect to Quest is available here: Connecting to Quest with FastX. You can also use FTP if you would like to automate your transfers. The details are available here: Transferring Files to and from Quest.
How do I use Amazon AWS on Quest?
The Amazon CLI is a python package install and is simple to install on Quest. To install this package, type the following commands:
export PATH=$HOME/.local/bin:$PATH module load python/anaconda3.6 pip install awscli --upgrade --user
How do I use the Google Cloud SDK on Quest?
The Google Cloud SDK is not installed system-wide on Quest, but it is
possible for you to install it into your own home directory. You will need
to follow the instructions on this page: Google Cloud SDK Quickstart Linux.
Namely, use the following command to download the SDK inside your home directory:
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-241.0.0-linux-x86_64.tar.gzThen unpack the archive:
tar xvfz google-cloud-sdk-241.0.0-linux-x86_64.tar.gzAnd then run the install script, which will update your $PATH variable to include the new directory you have just unpacked:
./google-cloud-sdk/install.shFrom there you can configure the Google Cloud SDK with your credentials, etc.
How do I get access to the Globus endpoint for RDSS (RESFILES/FSMRESFILES)?
To be able to use Globus to transfer data to and from your RDSS (also known as RESFILES or FSMRESFILES), open a service request by emailing email@example.com.
How do I transfer files to and from RDSS (RESFILES/FSMRESFILES) via Globus?
If you lose the ability to connect to a previously mounted resfiles or fsmresfiles share via Globus, you can take the following steps to re-establish a connection:
- SSH to qglobus02.it.northwestern.edu using your NetID as your username. This operation will automatically mount your RDSS share. There is no need to keep the SSH connection open, so you may exit at any time.
- Open https://app.globus.org/file-manager in the browser and log in as a Northwestern user.
- The Quest endpoint name is "Northwestern Quest". the RDSS endpoint (for RESFILES and FSMRESFILES shares) is "Northwestern Quest RDSS".
More information regarding Globus transfers is available here: Globus Transfer FAQ.
Scheduler and Job Submissions
I get the error "Unable to allocate resources: Requested time limit is invalid (missing or exceeds some limit)" when trying to submit a job.
This error indicates that you have specified your job to run for longer
than a given queue will allow. To allow this job to run, you will need to
either reduce the amount of walltime for the job to be within the selected
queue's limits, or define a larger queue with a higher walltime.
You can find a list of all queues and their walltime limits at Quest Partitions/Queues.
My job was killed on a login node.
From time to time, we encounter errors on the login nodes that require
killing all running jobs on that node to prevent the node from crashing.
Unfortunately, your job may have been one of those that were
It is recommended that users submit interactive or batch jobs to the compute nodes to avoid such job cancellations. Login nodes are shared resources are intended as entry points to Quest for all users. Submitting CPU or memory heavy jobs will affect everyone trying to access Quest.
I get the error "sbatch: error: Batch job submission failed: Invalid qos specification" when trying to submit a job.
This error is commonly observed if your allocation has expired. Slurm does not
allow job submission if you are using an expired allocation. You can run the
command checkproject <allocation ID> to see
the expiration date of your allocation. If your research project continues and
you want to continue using the same allocation, you will need to renew it by
following the link below.
Request Research Allocation Forms
The software I want to use is not available on Quest.
If the software you require for your research is not available on Quest,
there are a few options you can try. The first is to perform a local
software installation to your home directory or project directory following
instructions specific to the software you are trying to install.
If a local install does not work, you can fill out the Software Installation Request Form. We will then assist you with the software installation.
How can I install and use Tensorflow with GPUs on Quest?
While most Quest nodes do not have a GPU, there is a partition of Quest called gengpu that does have GPU nodes available for general access. We recommend you create your own conda environment for use on these nodes, with the specific Python machine learning tools you want to use installed.
The GPU nodes in this partition currently have NVIDIA drivers installed that are compatible with CUDA versions up to 9.2.
Please follow these steps to create a conda environment with tensorflow-gpu installed:
- Submit an interactive job to the gengpu partition:
srun -A <allocationID> -p gengpu -t 01:00:00 -N 1 --ntasks-per-node=1 --mem-per-cpu=3G --gres=gpu:1 --pty bash -l
- Once the session starts on the compute node, verify that the gpu is recognized using the command below:
- Load the following modules:
module load anaconda3/2018.12 module load cuda/cuda-9.2
- Create the conda environment and install the tensorflow-gpu package at the same time:
conda create -n tensorflow-gpu-env tensorflow-gpu=1.12.0
- Activate the environment before testing your tensorflow:
source activate tensorflow-gpu-env
- Once you have verified that everything is working and you have installed any other packages you need with conda, deactivate the environment:
- Exit the interactive session
#!/bin/bash #SBATCH -p gengpu #SBATCH -A <allocationID> #SBATCH --gres=gpu:1 #SBATCH -N 1 #SBATCH -n 1 #SBATCH -t 01:00:00 # 1 hour in this example. gengpu has a max walltime of 48 hours. #SBATCH --job-name="test" module purge all module load cuda/cuda-9.2 module load anaconda3/2018.12 source activate tensorflow-gpu-env python <yourpythonscript>Note: to reserve a specific GPU architecture (e.g. K40 or K80), use a --gres directive like this: --gres=gpu:k40:1 or --gres=gpu:k80:1.