Getting Started on the Genomics Compute Cluster (b1042) on Quest

After applying for access to the genomics compute cluster, and receiving an orientation on using the genomics compute cluster, users can submit jobs to the compute nodes.

About the Genomics Compute Cluster on Quest 

The Feinberg School of Medicine, in an initiative spearheaded by the Feinberg School of Medicine Office of the Dean, the Center for Genetic Medicine, and the Department of Biochemistry and Molecular Genetics provides 100 nodes on Northwestern’s High Performance Computing Cluster, Quest, to be used for genomics research. This University resource has been made available to the greater genomics community in an effort to foster genomics research and empower the computational genomics community at Northwestern University.

How do I start using the Genomics Compute Cluster (b1042)?

You will first need to log in to Quest.

Moving files onto Quest

If you have sequencing done by NUSeq Core, they will deliver your FASTQ files via Northwestern Box. To get files from Box onto Quest:

  1. Log in to Quest. Northwestern IT recommends you use FastX to connect, but experienced users may opt for an SSH connection with X11 forwarding enabled.
  2. Run Firefox by typing the following commands in the terminal window: module load firefox, followed by firefox to launch the Firefox browser. A Firefox browser window will open.
  3. In the Firefox browser that opened, go to https://northwestern.box.com and log in with your NetID credentials.
  4. Select the file you wish to move onto Quest and click Download in the upper right corner. By default, the file will be placed into the Downloads directory in your home directory.

Note: If you want the file to go directly into /projects/b1042, before downloading click on the menu icon in the upper right of the Box browser, and click Preferences. From there you can select which directory to download into on Quest, including the /projects directories.

For information about transferring files from your local machine, RDSS (via smbclient), or Globus to Quest please visit the Transferring Files on Quest web page.

Files may not be transferred directly from FSMRESFILES (FSM secure data storage) to Quest.

Using bioinformatics software available on Quest

To see a complete list of software installed on Quest by the Quest administrators, type module avail. Modules are scripts that make it possible to run software in your shell; if you would like to run a particular software package, use the module load command, and then run the software.

New software is continually being added to Quest. Please use the command module avail for an up-to-date listing of all software packages including genomics software on Quest.

If you would like additional software packages installed on Quest, please contact quest-help@northwestern.edu. Additionally, software can also be installed by users in their own directories.

Running jobs on the Genomics Compute Cluster (b1042) nodes: Submission Scripts

To submit, monitor, modify, and delete jobs on Quest you must use Moab commands, which begin with #MSUB.

Useful commands to put into your job submission script:

Example Commands Description
#!/bin/bash REQUIRED: The first line of your script, specifying the type of shell (in this case, bash) to use
#MSUB -A b1042 REQUIRED: Tells the scheduler you are using the b1042 genomics account
#MSUB -q genomics REQUIRED: Puts your job into the genomics queue to run on the b1042 nodes. Additional queues are outlined below.
#MSUB -l walltime=<hh:mm:ss> REQUIRED: Provides the scheduler with the time needed for your job to run so resources can be allocated. Quest allows jobs of up to 7 days (168 hours).
#MSUB -l nodes=<N>:ppn=<p> Specifies how many nodes and how many ppn (processors (cores) per node) to use. If this option isn't specified, one core on one node will be allocated.
 
NOTE: The vast majority of open source bioinformatics software runs on a single node - please do not request multiple nodes unless the software you are running explicitly states that you can do so. If your software supports multi-threading, request one core for each thread requested when the software is called.
#MSUB -N <name_of_job> Gives the job a descriptive name, useful for reporting, such as when using the command qstat.
#MSUB -m abe Sends an email if your job (a)borts, (b)egins, or (e)nds. You must include you email address in your .forward file in your /home/NetID directory or use the command below. You can use any combination of the 3 letters, or n for (n)one of them.
#MSUB -M <your email> Specifies email address, can be a comma separated list of users.
#MSUB -j oe Joins the (o)utput and (e)rror files into a single file, such that errors are also sent to the output file. By default the name of the output file will be of the form JOBNAME.oJOBID, and the file will be in the directory from which you submitted the job.
#MSUB -o <outlog> Writes the output log for the job (whatever would go to stdout) into a file named outlog (you can change this file name). If not specified, stdout is written to a file in the directory you submitted the job from that is named according to JOBNAME.oJOBID.
#MSUB -e <errlog> Writes the error file for the job (whatever would go to stderr) into a file named errlog (you can change this file name). The error file is very important for diagnosing jobs that fail to run properly. If not specified, stderr will be written to a file in the directory you submitted the job from that is named according to JOBNAME.eJOBID.
cd $PBS_O_WORKDIR This command changes the working directory to the location from which the job is submitted (see below). $PBS_O_WORKDIR is a convenience value; you should cd to whatever directory is appropriate for your job, mostly likely /projects/b1042/. If you haven’t yet created a directory for your PI group in /projects/b1042, please do it before running your job to keep your files separate from other users.

Genomics Compute Cluster Queues

Queues are defined for different types of jobs and have different resource limits.
  • genomics: For jobs that require less than 48 hours to run. Individual jobs in the genomics queue must run on fewer than 10 nodes. Individual users may submit multiple jobs to the genomics queue and utilize up to 1,680 cores simultaneously across multiple jobs.
    Script example: #MSUB -q genomics
  • genomicsguest: a lower-priority version of the genomics queue which non-Feinberg users.
    Script example: #MSUB -q genomicsguest
  • genomicslong: For jobs that require between 48 - 240 hours to run. Individual jobs in the genomics queue must run on fewer than 4 nodes. Individual users may use up to 96 cores in this queue at one time.
    Script example: #MSUB -q genomicslong
  • genomicsguestex: a lower-priority version of the genomicslong queue for non-Feinberg users.
    Script example: #MSUB -q genomicsguestex
  • genomics-burst: For special projects, large jobs requiring more than 10 nodes or that run for more than 48 hours may use the genomics-burst queue by contacting quest-help@northwestern.edu. Before using genomics-burst queue, users must meet with the Senior Bioinformatics Specialist to confirm that code has been reviewed for efficiency. These appointments will be set up after reservations are requested and are intended to ensure best practices for this shared resource. Because of the resources involved, jobs may need to wait for availability when requesting the genomics-burst queue. It is advised to schedule at least three (3) weeks in advance.

An example submission script

Note: Line-numbers are included for reference purposes - do not put them in your script.

jobscript.sh
 1 #!/bin/bash
 2 #MSUB -A b1042
 3 #MSUB -q genomics
 4 #MSUB -l walltime=24:00:00
 5 #MSUB -M myemailaddress
 6 #MSUB -j oe
 7 #MSUB -N somename_tophat
 8 #MSUB -l nodes=1:ppn=6
 9 export PATH=$PATH:/projects/pxxxxx/tools/
10 module load bowtie2/2.2.6
11 module load tophat/2.1.0
12 module load samtools
13 module load boost
14 module load gcc/4.8.3
15 module load java
16 cd $PBS_O_WORKDIR
17 # Make Directory for FastQC reports in your PI folder in b1042
18 mkdir /projects/b1042/my_PI/fastqc/reports
19
20 # Trim poor quality sequence
21 java -jar <someinput> <someoutput>
22 # Running FastQC
23 fastqc -o <other_input> /projects/b1042/my_PI/fasqc/reports/<other_output>
  • Line 1 loads the bash shell.
  • Lines 2-8 are interpreted by MOAB. Until MOAB acquires the resources, no other line in this script is executed.
  • Line 9 sets the user's path to include their own tools
  • Lines 10-15 loads modules from centrally managed software
  • Line 16 returns the users to the directory from where they submitted the job
  • Lines 17 and 20 are comments
  • Line 18 is a shell command
  • Lines 21 and 23 launch application executables

If the name of the script is jobscript.sh, the job is submitted with the command:

msub jobscript.sh

For more examples, and a downloadable version of the example script above, see the GitHub repository of example jobs.

Submitting Your Batch Job

After you've written and saved your submission script, you can submit your job. At the command line type

msub <name_of_script>

where, in the example above <name_of_script> would be jobscript.sh. Upon submission the scheduler will return your job number.

If you receive a “permission denied” error, check to make sure your script has the correct permission to execute by typing

ls -l <name_of_script>

The fourth character in the permissions string that is output by the above command indicates if you can execute your file. If it is not an “x”, type

chmod u+x <name_of_script>

to enable execution and resubmit. 

See Also:




Keywords:genomics, quest, research computing, b1042, bioinformatics, genetics, bwa, rna-seq, gatk   Doc ID:78602
Owner:Research Computing .Group:Northwestern
Created:2017-12-07 13:38 CSTUpdated:2018-10-11 08:22 CST
Sites:Northwestern
CleanURL:https://kb.northwestern.edu/genomics-compute-cluster
Feedback:  0   0