Quest FAQ

This page contains frequently asked questions about the Quest high performance computing cluster at Northwestern.

Allocations And Accounts

How do I get an account on Quest?

To join quest you will need to submit an allocation request form for either a Research I or Research II allocation. These forms can be found below.

Request Research Allocation Forms

A Research I allocation will provide you with 35,000 computational hours and 500 GB of storage. This allocation will fit the needs of the majority of our users. A Research II allocation will provide you with 500,000 computational hours and 2 TB of storage. Research II allocations are for projects with a very large computational need, however you will need to submit a lengthier allocation. Both allocation types are available free of charge, but we do request that a chartstring is provided for your research if possible. This chartstring would be used for internal tracking purposes so that we can see the impact Quest is having on research done on campus. Alternatively, you can opt to join an existing allocation. Please fill out the Join Existing Quest Allocation form.



How do I use GPUs on Quest?

There are four Quest5 nodes each equipped with two Nvidia K40 cards for general access use. To access the general access GPU nodes, you will need to apply to join the allocation "gengpu" via the form linked below.

Join GPU Allocation Form

Once you have submitted the form and have been approved, you can specify gengpu as the allocation ID and buyin for the queue type in your submission script to run jobs on the GPU nodes. When defining computing resources needed, you will need to also add gpus flags as in nodes=1:ppn=1:gpus=1



How do I join an existing allocation on Quest?

In order to get an account on quest, you must first fill out an application form. After this application form is filled out, we will contact the CI of that allocation, and they will either approve or deny your membership in that allocation. If you would like to join multiple allocations, you will need to submit separate applications for each allocation.

The link to join an existing allocation can be found here: Join Existing Allocation Form

After you have filled out this application, we will contact the CI for this allocation, generally within one business day after the receipt of the form.

Data Transfer

How do I transfer files to and from Box on Quest?

The best way to transfer data between Quest storage and Box would be to connect to Quest with the FastX client and start a Gnome desktop session or a Gnome terminal session. Then launch a terminal (or in the terminal that is already launched, if you pick a Gnome terminal session) and type

module load firefox
firefox
to launch the firefox browser. In that browser you can log into Box and transfer files.

Documentation on how to use FastX to connect to Quest is available here: Connecting to Quest with FastX.



How do I use Amazon AWS on Quest?

The Amazon CLI is a python package install and is simple to install on Quest. To install this package, type the following commands:

export PATH=$HOME/.local/bin:$PATH
module load python/anaconda3.6
pip install awscli --upgrade --user



How do I use the Google Cloud SDK on Quest?

The Google Cloud SDK is not installed system-wide on Quest, but it is possible for you to install it into your own home directory. You will need to follow the instructions on this page: Google Cloud SDK Quickstart Linux.

Namely, use the following command to download the SDK inside your home directory:

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-241.0.0-linux-x86_64.tar.gz
Then unpack the archive:
tar xvfz google-cloud-sdk-241.0.0-linux-x86_64.tar.gz
And then run the install script, which will update your $PATH variable to include the new directory you have just unpacked:
./google-cloud-sdk/install.sh
From there you can configure the Google Cloud SDK with your credentials, etc.



How do I get access to the Globus endpoint for RDSS (RESFILES/FSMRESFILES)?

To be able to use Globus to transfer data to and from your RDSS (also known as RESFILES or FSMRESFILES), open a service request by emailing quest-help@northwestern.edu.



How do I transfer files to and from RDSS (RESFILES/FSMRESFILES) via Globus?

If you lose the ability to connect to a previously mounted resfiles or fsmresfiles share via Globus, you can take the following steps to re-establish a connection:

  1. SSH to qglobus02.it.northwestern.edu using your NetID as your username. This operation will automatically mount your RDSS share. There is no need to keep the SSH connection open, so you may exit at any time.
  2. Open https://app.globus.org/file-manager in the browser and log in as a Northwestern user.
  3. The Quest endpoint name is "Northwestern Quest". the RDSS endpoint (for RESFILES and FSMRESFILES shares) is "Northwestern Quest RDSS".

More information regarding Globus transfers is available here: Globus Transfer FAQ.

Scheduler and Job Submissions

I get the error "walltime too high for selected queue" when trying to submit a job.

This error indicates that you have specified your job to run for longer than a given queue will allow. To allow this job to run, you will need to either reduce the amount of walltime for the job to be within the selected queue's limits, or define a larger queue with a higher walltime.

You can find a list of all queues and their walltime limits at https://kb.northwestern.edu/70717.



My job was killed on a login node.

From time to time, we encounter errors on the login nodes that require killing all running jobs on that node to prevent it from crashing. Unfortunately, your job may have been one of those that were killed.

It is recommended that users submit interactive or batch jobs to the compute nodes to avoid such job cancellations. Login nodes are shared resources are intended as entry points to Quest for all users. Submitting CPU or memory heavy jobs will affect everyone trying to access Quest.



My jobs are being placed on systemhold.

Quest has a safety feature built in that will not allow jobs to run when an allocation gets below a certain amount of hours. This prevents accounts from running out of hours and jobs being terminated in the middle of a job and is one of the most common reasons for jobs being placed on Systemhold. You can run the command checkproject <allocation ID> to see how many hours you have remaining in your location. You may request additional computational hours by following the link below.

Extend Allocation Request Form

Once we receive the request, you can expect to receive the additional hours within one business day. You can then continue with your job submission.

Software

The software I want to use is not available on Quest.

If the software you require for your research is not available on Quest, there are a few options you can try. The first is to perform a local software installation to your home directory or project directory following instructions specific to the software you are trying to install.

If a local install does not work, you can fill out the Software Installation Request Form. We will then assist you with the software installation.


How can I install and use Tensorflow with GPUs on Quest?

While most Quest nodes do not have a GPU, there is a partition of Quest called gengpu that does have GPU nodes available for general access. We recommend you create your own conda environment for use on these nodes, with the specific Python machine learning tools you want to use installed.

The GPU nodes in this partition currently have NVIDIA drivers installed that are compatible with CUDA versions up to 9.2.

Please follow these steps to create a conda environment with tensorflow-gpu installed:

  1. Submit an interactive job to the gengpu partition:
    srun -A <allocationID> -p gengpu -t 01:00:00 -N 1 --ntasks-per-node=1 --mem-per-cpu=3G --gres=gpu:1 --pty bash -l
  2. Once the session starts on the compute node, verify that the gpu is recognized using the command below:
    nvidia-smi
  3. Load the following modules:
    module load anaconda3/2018.12
    module load cuda/cuda-9.2
  4. Create the conda environment and install the tensorflow-gpu package at the same time:
    conda create -n tensorflow-gpu-env tensorflow-gpu=1.12.0
  5. Activate the environment before testing your tensorflow:
    source activate tensorflow-gpu-env
  6. Once you have verified that everything is working and you have installed any other packages you need with conda, deactivate the environment:
    source deactivate
  7. Exit the interactive session
After verifying that the installation works, you can submit jobs to the gpu nodes using job submission script similar to below:
gpu_batch_script.sh
#!/bin/bash
#SBATCH -p gengpu
#SBATCH -A <allocationID>
#SBATCH --gres=gpu:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 01:00:00 # 1 hour in this example. gengpu has a max walltime of 48 hours.
#SBATCH --job-name="test"

module purge all
module load cuda/cuda-9.2
module load anaconda3/2018.12

source activate tensorflow-gpu-env

python <yourpythonscript>
Note: to reserve a specific GPU architecture (e.g. K40 or K80), use a --gres directive like this: --gres=gpu:k40:1 or --gres=gpu:k80:1.

See Also:




Keywords:quest,scheduler,hpc,globus,rdss,resfiles,transfer,fsmresfiles   Doc ID:90865
Owner:Research Computing .Group:Northwestern
Created:2019-04-05 10:05 CDTUpdated:2019-06-06 13:33 CDT
Sites:Northwestern
CleanURL:https://kb.northwestern.edu/quest-faq
Feedback:  0   0