Submitting a Job on Quest
Examples of submitting interactive and batch jobs to the Quest compute nodes.
Jobs can be submitted to the Quest compute nodes in two ways: Interactive jobs, which are particularly useful for GUI applications, or Batch jobs, which are the most common jobs on Quest. Interactive jobs are appropriate for GUI applications like Stata, or interactively testing and prototyping scripts; they should generally use a small number of cores (< 6) and be of short duration (a few hours). Batch jobs are appropriate for jobs with no GUI interface, and they can accommodate large core counts and long duration (up to a week).
The program that schedules jobs on Quest is Moab. To submit, monitor,
modify, and delete jobs on Quest you must use Moab commands, which begin
Torque is the name of the program that manages resources, like cores and memory, on Quest. Moab interacts with Torque.
For a video overview of submitting jobs on Quest, see Running Jobs on Quest.
Batch JobsTo submit a batch job, you first write a submission script specifying the resources you need and what commands to run, then you submit this script to the scheduler by running an msub command on the command line.
Example Submission Script
A submission script for a batch job could look like the following. These commands would be saved in a file such as jobscript.sh.
#!/bin/bash #MSUB -A p20XXX #MSUB -q short #MSUB -l walltime=04:00:00 #MSUB -M my_email_address #MSUB -j oe #MSUB -N projectname_mysoftware #MSUB -l nodes=1:ppn=6 # add a project directory to your PATH (if needed) export PATH=$PATH:/projects/p20XXX/tools/ # load modules you need to use module load python/anaconda module load java # Set your working directory cd $PBS_O_WORKDIR # A command you actually want to execute: java -jar <someinput> <someoutput> # Another command you actually want to execute, if needed: python myscript.py
The first line of the script loads the bash shell. Lines that begin with #MSUB are interpreted by Moab. Until Moab acquires the resources, no other line in this script is executed. In these lines, # is needed; it is not a comment character when used in #MSUB.
After the Moab commands, the rest of the script works like a regular Bash script. You can modify environment variables, load modules, change directories, and execute program commands. Lines in the second half of the script that start with # are comments.
In the example above,
export PATH=$PATH:/projects/p20XXX/tools/ is used to put additional tools stored in a project directory on the user's path so that they can be easily called. $PBS_O_WORKDIR is a special environment variable that is set to the location from which you submit the script.
cd $PBS_O_WORKDIR sets the working directory
to be the directory from which the script was submitted. You can set a
different working directory instead if your code is located in a
different directory than your submission script.
Find a downloadable copy of this example script on GitHub.
Commands and Options
|#!/bin/bash||REQUIRED: The first line of your script, specifying the type of shell (in this case, bash) to use|
|#MSUB -A <allocation ID>||REQUIRED: Tells the scheduler the allocation name, so that it can access your compute hours|
|#MSUB -l walltime=<hh:mm:ss>||REQUIRED: Provides the scheduler with the time needed for your job to run so resources can be allocated. Quest allows jobs of up to 7 days (168 hours).|
|#MSUB -q <queuename>||REQUIRED: Common values are short, normal, long, or buyin. See Quest Queues for details on the queue to choose for different length jobs.|
|#MSUB -N <name_of_job>||Gives the job a descriptive name, useful for reporting, such as when using the command qstat.|
|#MSUB -m abe||Sends an email if your job (a)borts, (b)egins, or (e)nds. You must include you email address in your .forward file in your /home/NetID directory or use the command below. You can use any combination of the 3 letters, or n for (n)one of them.|
|#MSUB -M <your email>||Specifies email address, can be a comma separated list of users.|
#MSUB -l nodes=<N>:ppn=<p>
#MSUB -l procs=<n>
|The first option specifies how many nodes and how many processors (cores) per node. The second option specifies how many processors total without restricting them to being on a specific number of nodes. Use only one of these lines, NOT both. If neither of these options are used, one core on one node will be allocated. If your code is not parallelized, one core on one node may be appropriate for your job.|
|#MSUB -l mem=<n>gb||
Specifies the amount of memory needed by a single node job, where <n> is the number of GB of RAM you're requesting. The limit is 120GB on normal Quest compute nodes. Note that this option is not guaranteed to reserve the memory resources, but it is good practice to include.
|#MSUB -j oe||Joins the (o)utput and (e)rror files into a single file, such that errors are also sent to the output file. By default the name of the output file will be of the form JOBNAME.oJOBID, and the file will be in the directory from which you submitted the job.|
|#MSUB -o <outlog>||Writes the output log for the job (whatever would go to stdout) into a file named outlog (you can change this file name). If not specified, stdout is written to a file in the directory you submitted the job from that is named according to JOBNAME.oJOBID.|
|#MSUB -e <errlog>||Writes the error file for the job (whatever would go to stderr) into a file named errlog (you can change this file name). The error file is very important for diagnosing jobs that fail to run properly. If not specified, stderr will be written to a file in the directory you submitted the job from that is named according to JOBNAME.eJOBID.|
|cd $PBS_O_WORKDIR||This command changes the working directory to the location from which the job is submitted (see below). $PBS_O_WORKDIR is a convenience value; you should cd to whatever directory is appropriate for your job.|
Submitting Your Batch Job
After you've written and saved your submission script, you can submit your job. At the command line type
where, in the example above <name_of_script> would be jobscript.sh. Upon submission the scheduler will return your job number.
If you receive a “permission denied” error, check to make sure your script has the correct permission to execute by typing
ls -l <name_of_script>
The fourth character in the permissions string that is output by the above command indicates if you can execute your file. If it is not an “x”, type
chmod u+x <name_of_script>
to enable execution and resubmit.
Once logged into Quest, to launch an interactive job on the compute nodes, you can enter any job parameters on the command line when you call msub instead of writing a job submission script. Common parameters for msub are below; the full set of available parameters are the same as for batch jobs (see above).
Common Interactive Job Options
|-I||REQUIRED: The option that makes the job interactive.|
|-X||Enable X window forwarding for GUI applications.|
|-A <allocation ID>||REQUIRED: Tells the scheduler the allocation name, so that it can access your compute hours|
|-l walltime=<hh:mm:ss>||REQUIRED: Provides the scheduler with the time needed for your job to run so resources can be allocated. Quest allows jobs of up to 7 days (168 hours).|
|-q <queuename>||REQUIRED: The queue name. You'll usually use the short queue for jobs less than 4 hours for interactive jobs. See Quest Queues for details on the queue to choose for longer jobs.|
|-l mem=<n>gb||Specifies the amount of memory needed by a single node job, where <n> is the number of GB of RAM you're requesting. The limit is 120GB on normal Quest compute nodes. Note that this option is not guaranteed to reserve the memory resources, but it is good practice to include.|
|-l nodes=<N>:ppn=<p>||Specifies how many nodes and how many processors (cores) per node.|
If you need to reserve a lot of memory for an interactive job, contact firstname.lastname@example.org for options if you run into issues.
Example 1: Interactive Job to Run a Command Line Program
msub -I -l nodes=1:ppn=4 -l walltime=01:00:00 -q short -A <allocationid>
This would run an interactive job on a single node with four cores for up to an hour, charging to the <allocationid> (ex. p300XX) account.
Example 2: Interactive job to run a GUI program
If you're connecting to Quest using SSH via a terminal program, then you need to make sure to enable X-forwarding when you connect by using the -X option:
ssh -X <netid>@quest.it.northwestern.edu
If you use FastX to connect instead, then X-forwarding will be enabled by default in the GNOME terminal.
For an interactive job with a GUI component, you will need to use the “-X” flag for msub that allows for X tunneling from Quest to your desktop display. For example:
msub -X -I -l nodes=1:ppn=4 -l walltime=01:00:00 -q short -A <allocationid>
This requires an X client to be installed on your desktop, which is the case if you're using FastX.When you enter the msub for an interactive job, there will be a pause while the scheduler looks for available resources. Then you will be provided information about the compute node you're assigned, and you will be automatically connected to it. The command prompt in your terminal will change to reflect this new connection. You can then proceed with your work as if you were on a login node. See an example of this with Example: Interactive Job with Two Nodes.