Quest Slurm Quick Start
This page contains information for using the Slurm test cluster on Quest.
To view Slurm training videos, visit Quest Slurm Scheduler Training Materials.
Please login to the Slurm test cluster to update your job submission scripts to Slurm in advance of May 1, 2019. All job submission scripts that currently run on Quest must be modified to run on the new Slurm scheduler. To access the Slurm cluster on Quest, log in to slurmtest.northwestern.edu:
From the command line, you will be able to submit jobs to the Slurm scheduler. When submitting jobs to the Slurm scheduler, use the allocations and queue names you already use. During the testing period, Slurm jobs will not be debited against your production allocations.
The 200 nodes in the test Slurm cluster each have 20 cores and 128GB of RAM; submission scripts may need to be modified to reflect these resource limits. The maximum walltime accepted on the Slurm test cluster is 48 hours. For memory allocation purposes, each core in the pilot cluster is assigned 6GB of RAM.
Before you submit your Slurm job, modify your existing job submission script to change the Moab directives into Slurm directives. Most submission scripts will be straightforward to modify, but implementing advanced features in Slurm can be expected to take additional time and effort.
Simple Batch Job Submission Script Conversion Example
The sbatch command is used for scheduler directives in job submission scripts as well as the job submission command at the command line. To modify your job scripts to work with Slurm, you'll need to edit all lines that currently begin with #MSUB. In addition to replacing each #MSUB with #SBATCH, at a minimum you'll need to edit lines specifying the queue, nodes, and walltime. If you have additional scheduler directives, please see the full list of Slurm job directive commands and their analogues in Moab.
Example Moab job submission script
#!/bin/bash #MSUB -A b1042 ## account #MSUB -q genomics ## queue name #MSUB -l nodes=1:ppn=1 ## number of nodes and cores #MSUB -l walltime=00:10:00 ## walltime #MSUB -N sample_job ## name of job module load python ## Load modules python helloworld.py ## Run program
Example Slurm job submission script
#!/bin/bash #SBATCH -A b1042 ## account (unchanged) #SBATCH -p genomics ## "-p" instead of "-q" #SBATCH -N 1 ## number of nodes #SBATCH -n 1 ## number of cores #SBATCH -t 00:10:00 ## walltime #SBATCH --job-name="test" ## name of job module purge all ## purge environment modules module load python ## Load modules (unchanged) python helloworld.py ## Run program (unchanged)
Note that when you submit your job Slurm passes your current environment variables to the compute nodes, including any modules you've loaded on the command line before the job was submitted. By comparison, Moab does not pass environmental variables but does source a user's ~/.bashrc file on the compute node before running the job submission script. Because Slurm and Moab use different methods to replicate a user's environment on the compute nodes, scripts that rely on environment variables may behave in unexpected ways or fail.
Queues == Partition
Note that in Slurm, a "queue" is called a "partition". Moving forward, what we have historically called "partitions" on Quest - Quest5, Quest6 and Quest8 - will now be referred to as "architectures".
Submitting a Batch Job
sbatch or qsub replace msub for job submissions on the command line. To use sbatch to submit a job to the Slurm scheduler:
sbatch job_script.sh Submitted batch job 546723
or in cases where you only want the job number to be returned, you can use qsub:
qsub job_scipt.sh 546723
Both sbatch and qsub can be used to submit a batch script to Slurm. Note that not all functionality of qsub is provided under Slurm, making sbatch the preferred command for submitting jobs.
Slurm will reject the job at submission time if there are requests or constraints within the job submission script that Slurm cannot meet. This gives the user the opportunity to examine the rejected job request and resubmit it with the necessary corrections. With Slurm, if a job number is returned at the time of job submission, the job will run although it may experience a wait time in the queue depending on how busy the system is.
Submitting an Interactive Job
srun replaces msub. To launch a interactive job from the command line use the srun command:
srun --pty --account=<account> --time=<hh:mm:ss> --partition=<queue_name> --mem=<xG> bash
This will launch a terminal session on the compute node as a single core job. To request additional cores for multi-threaded applications, include the -N and -n flags:
srun --pty -N 1 -n 6 --account=<account> --time=<hh:mm:ss> --partition=<queue_name> --mem=<xG> bash
For additional information on interactive jobs under Slurm, please see Submitting a Job on Quest.
squeue replaces both showq and qstat. To use squeue to see all jobs:
To see just your jobs:
squeue -u <NetID>
squeue returns information on jobs in the Slurm queue:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 546723 short slurm2.s jon9348 R INVALID 1 qnode4017 546711 short high-thr akh9585 R 2:34 3 qnode[4180-4181,4196] 546712 short high-thr akh9585 R 2:34 3 qnode[4078,4086,4196]
|JOBID||Number assigned to the job upon submission|
|PARTITION||The queue, also called partition, that the job is running in|
|NAME||Name of the job submission script|
|USER||NetID of the user who submitted the job|
|ST||State of the job: "R" for Running or "PD" for Pending (Idle)|
|TIME||hours:minutes:seconds a job has been running; can be INVALID for the first few minutes.|
|NODES||Number of nodes the job resides on|
|NODELIST||Names of the nodes the job is running on|
To cancel a single job use scancel:
To cancel all of your jobs:
scancel -u <netID>
For additional job commands, please see Common Job Commands.