Quest Slurm Quick Start

This page contains information for getting started with the Slurm test cluster on Quest.

To view a short video demonstration visit Quest: Modifying Moab scripts for Slurm

To view Slurm training videos, visit Quest Slurm Scheduler Training Materials.

Quick Start

All job submission scripts that currently run on Quest must be modified to run on the new Slurm scheduler. When submitting jobs to the Slurm scheduler, use the allocations and queue names you already use. 

Before you submit your Slurm job, modify your existing job submission script to change the Moab directives into Slurm directives. Most submission scripts will be straightforward to modify, but implementing advanced features in Slurm can be expected to take additional time and effort.

Simple Batch Job Submission Script Conversion Example

The sbatch command is used for scheduler directives in job submission scripts as well as the job submission command at the command line. To modify your job scripts to work with Slurm, you'll need to edit all lines that currently begin with #MSUB. In addition to replacing each #MSUB with #SBATCH, at a minimum you'll need to edit lines specifying the queue, nodes, and walltime. If you have additional scheduler directives, please see the full list of Slurm job directive commands and their analogues in Moab.

Example Moab job submission script
#!/bin/bash
#MSUB -A b1042               ## account
#MSUB -q genomics            ## queue name 
#MSUB -l nodes=1:ppn=1       ## number of nodes and cores
#MSUB -l walltime=00:10:00   ## walltime
#MSUB -N sample_job          ## name of job

module load python           ## Load modules

python helloworld.py         ## Run program
                        
Example Slurm job submission script
#!/bin/bash
#SBATCH -A b1042             ## account (unchanged)
#SBATCH -p genomics          ## "-p" instead of "-q"
#SBATCH -N 1                 ## number of nodes
#SBATCH -n 1                 ## number of cores
#SBATCH -t 00:10:00          ## walltime
#SBATCH	--job-name="test"    ## name of job

module purge all	     ## purge environment modules
module load python           ## Load modules (unchanged)

python helloworld.py         ## Run program (unchanged)
                        

Environment

Note that when you submit your job Slurm passes your current environment variables to the compute nodes, including any modules you've loaded on the command line before the job was submitted. By comparison, Moab does not pass environmental variables but does source a user's ~/.bashrc file on the compute node before running the job submission script. Because Slurm and Moab use different methods to replicate a user's environment on the compute nodes, scripts that rely on environment variables may behave in unexpected ways or fail.

Queues == Partition

Note that in Slurm, a "queue" is called a "partition". Moving forward, what we have historically called "partitions" on Quest - Quest5, Quest6 and Quest8 - will now be referred to as "architectures".

Submitting a Batch Job

sbatch or qsub replace msub for job submissions on the command line. To use sbatch to submit a job to the Slurm scheduler:

sbatch job_script.sh
Submitted batch job 546723

or in cases where you only want the job number to be returned, you can use qsub:

qsub job_scipt.sh
546723

Both sbatch and qsub can be used to submit a batch script to Slurm. Note that not all functionality of qsub is provided under Slurm, making sbatch the preferred command for submitting jobs.  

Slurm will reject the job at submission time if there are requests or constraints within the job submission script that Slurm cannot meet. This gives the user the opportunity to examine the rejected job request and resubmit it with the necessary corrections. With Slurm, if a job number is returned at the time of job submission, the job will run although it may experience a wait time in the queue depending on how busy the system is.

If your job submission receives an error, see Debugging your Slurm submission script.

Submitting an Interactive Job

srun replaces msub. To launch a interactive job from the command line use the srun command:

srun --pty --account=<account> --time=<hh:mm:ss> --partition=<queue_name> --mem=<xG> bash

This will launch a terminal session on the compute node as a single core job. To request additional cores for multi-threaded applications, include the -N and -n flags:

srun --pty -N 1 -n 6 --account=<account> --time=<hh:mm:ss> --partition=<queue_name> --mem=<xG> bash

For additional information on interactive jobs under Slurm, please see Submitting a Job on Quest.

Monitoring Jobs

squeue replaces both showq and qstat. To use squeue to see all jobs:

squeue

To see just your jobs:

squeue -u <NetID>

squeue returns information on jobs in the Slurm queue:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
546723     short slurm2.s  jon9348  R   INVALID      1 qnode4017
546711     short high-thr  akh9585  R       2:34      3 qnode[4180-4181,4196]
546712     short high-thr  akh9585  R       2:34      3 qnode[4078,4086,4196]
Field Description
JOBID Number assigned to the job upon submission
PARTITION The queue, also called partition, that the job is running in
NAME Name of the job submission script
USER NetID of the user who submitted the job
ST State of the job: "R" for Running or "PD" for Pending (Idle)
TIME hours:minutes:seconds a job has been running; can be INVALID for the first few minutes.
NODES Number of nodes the job resides on
NODELIST Names of the nodes the job is running on

Cancelling Jobs

To cancel a single job use scancel:

scancel <job_ID_number>

To cancel all of your jobs:

scancel -u <netID>

For additional job commands, please see Common Job Commands.

See Also:




Keywords:quest,slurm,moab,torque,hpc,sbatch,srun,msub,batch   Doc ID:89456
Owner:Research Computing .Group:Northwestern
Created:2019-02-01 17:32 CDTUpdated:2019-05-01 07:21 CDT
Sites:Northwestern
CleanURL:https://kb.northwestern.edu/quest-slurm-quick-start
Feedback:  0   0