Getting Started on the Genomics Compute Cluster (b1042) on Quest
After applying for access to the genomics compute cluster, and receiving an orientation on using the genomics compute cluster, users can submit jobs to the compute nodes.
About the Genomics Compute Cluster on Quest
The Feinberg School of Medicine, in an initiative spearheaded by the Feinberg School of Medicine Office of the Dean, the Center for Genetic Medicine, and the Department of Biochemistry and Molecular Genetics provides 100 nodes on Northwestern’s High Performance Computing Cluster, Quest, to be used for genomics research. This University resource has been made available to the greater genomics community in an effort to foster genomics research and empower the computational genomics community at Northwestern University.
How do I start using the Genomics Compute Cluster (b1042)?
You will first need to log in to Quest.
Moving files onto Quest
If you have sequencing done by NUSeq Core, they will deliver your FASTQ files via Northwestern Box. To get files from Box onto Quest:
- Log in to Quest. Northwestern IT recommends you use FastX to connect, but experienced users may opt for an SSH connection with X11 forwarding enabled.
- Run Firefox by typing the following commands in the terminal window: module load firefox, followed by firefox to launch the Firefox browser. A Firefox browser window will open.
- In the Firefox browser that opened, go to https://northwestern.box.com and log in with your NetID credentials.
- Select the file you wish to move onto Quest and click Download in the upper right corner. By default, the file will be placed into the Downloads directory in your home directory.
Note: If you want the file to go directly into /projects/b1042, before downloading click on the menu icon in the upper right of the Box browser, and click Preferences. From there you can select which directory to download into on Quest, including the /projects directories.FSMRESFILES (FSM secure data storage) to Quest.
Using bioinformatics software available on Quest
To see a complete list of software installed on Quest by the Quest administrators, type module avail. Modules are scripts that make it possible to run software in your shell; if you would like to run a particular software package, use the module load command, and then run the software.
New software is continually being added to Quest. Please use the command module avail for an up-to-date listing of all software packages including genomics software on Quest.
If you would like additional software packages installed on Quest, please contact firstname.lastname@example.org. Additionally, software can also be installed by users in their own directories.
To submit, monitor, modify, and delete jobs on Quest you must use Moab commands, which begin with #MSUB.
Useful commands to put into your job submission script:
|#!/bin/bash||REQUIRED: The first line of your script, specifying the type of shell (in this case, bash) to use|
|#MSUB -A b1042||REQUIRED: Tells the scheduler you are using the b1042 genomics account|
|#MSUB -q genomics||REQUIRED: Puts your job into the genomics queue to run on the b1042 nodes. Additional queues are outlined below.|
|#MSUB -l walltime=<hh:mm:ss>||REQUIRED: Provides the scheduler with the time needed for your job to run so resources can be allocated. Quest allows jobs of up to 7 days (168 hours).|
|#MSUB -l nodes=<N>:ppn=<p>||Specifies how many nodes and how many ppn (processors (cores) per node) to use. If this option isn't specified, one core on one node will be allocated.
NOTE: The vast majority of open source bioinformatics software runs on a single node - please do not request multiple nodes unless the software you are running explicitly states that you can do so. If your software supports multi-threading, request one core for each thread requested when the software is called.
|#MSUB -N <name_of_job>||Gives the job a descriptive name, useful for reporting, such as when using the command qstat.|
|#MSUB -m abe||Sends an email if your job (a)borts, (b)egins, or (e)nds. You must include you email address in your .forward file in your /home/NetID directory or use the command below. You can use any combination of the 3 letters, or n for (n)one of them.|
|#MSUB -M <your email>||Specifies email address, can be a comma separated list of users.|
|#MSUB -j oe||Joins the (o)utput and (e)rror files into a single file, such that errors are also sent to the output file. By default the name of the output file will be of the form JOBNAME.oJOBID, and the file will be in the directory from which you submitted the job.|
|#MSUB -o <outlog>||Writes the output log for the job (whatever would go to stdout) into a file named outlog (you can change this file name). If not specified, stdout is written to a file in the directory you submitted the job from that is named according to JOBNAME.oJOBID.|
|#MSUB -e <errlog>||Writes the error file for the job (whatever would go to stderr) into a file named errlog (you can change this file name). The error file is very important for diagnosing jobs that fail to run properly. If not specified, stderr will be written to a file in the directory you submitted the job from that is named according to JOBNAME.eJOBID.|
|cd $PBS_O_WORKDIR||This command changes the working directory to the location from which the job is submitted (see below). $PBS_O_WORKDIR is a convenience value; you should cd to whatever directory is appropriate for your job, mostly likely /projects/b1042/
- genomics: For jobs that require less than 48 hours to run. Individual jobs in the genomics queue must run on fewer than 10 nodes. Individual users may submit multiple jobs to the genomics queue and utilize up to 1,680 cores simultaneously across multiple jobs.
Script example: #MSUB -q genomics
- genomicsguest: a lower-priority version of the genomics queue which non-Feinberg users.
Script example: #MSUB -q genomicsguest
- genomicslong: For jobs that require between 48 - 240 hours to run. Individual jobs in the genomics queue must run on fewer than 4 nodes. Individual users may use up to 96 cores in this queue at one time.
Script example: #MSUB -q genomicslong
- genomicsguestex: a lower-priority version of the genomicslong queue for non-Feinberg users.
Script example: #MSUB -q genomicsguestex
- genomics-burst: For special projects, large jobs requiring more than 10 nodes or that run for more than 48 hours may use the genomics-burst queue by contacting email@example.com. Before using genomics-burst queue, users must meet with the Senior Bioinformatics Specialist to confirm that code has been reviewed for efficiency. These appointments will be set up after reservations are requested and are intended to ensure best practices for this shared resource. Because of the resources involved, jobs may need to wait for availability when requesting the genomics-burst queue. It is advised to schedule at least three (3) weeks in advance.
An example submission script
Note: Line-numbers are included for reference purposes - do not put them in your script.
1 #!/bin/bash 2 #MSUB -A b1042 3 #MSUB -q genomics 4 #MSUB -l walltime=24:00:00 5 #MSUB -M myemailaddress 6 #MSUB -j oe 7 #MSUB -N somename_tophat 8 #MSUB -l nodes=1:ppn=6 9 export PATH=$PATH:/projects/pxxxxx/tools/ 10 module load bowtie2/2.2.6 11 module load tophat/2.1.0 12 module load samtools 13 module load boost 14 module load gcc/4.8.3 15 module load java 16 cd $PBS_O_WORKDIR 17 # Make Directory for FastQC reports in your PI folder in b1042 18 mkdir /projects/b1042/my_PI/fastqc/reports 19 20 # Trim poor quality sequence 21 java -jar <someinput> <someoutput> 22 # Running FastQC 23 fastqc -o <other_input> /projects/b1042/my_PI/fasqc/reports/<other_output>
- Line 1 loads the bash shell.
- Lines 2-8 are interpreted by MOAB. Until MOAB acquires the resources, no other line in this script is executed.
- Line 9 sets the user's path to include their own tools
- Lines 10-15 loads modules from centrally managed software
- Line 16 returns the users to the directory from where they submitted the job
- Lines 17 and 20 are comments
- Line 18 is a shell command
- Lines 21 and 23 launch application executables
If the name of the script is jobscript.sh, the job is submitted with the command:
Submitting Your Batch Job
After you've written and saved your submission script, you can submit your job. At the command line type
where, in the example above <name_of_script> would be jobscript.sh. Upon submission the scheduler will return your job number.
If you receive a “permission denied” error, check to make sure your script has the correct permission to execute by typing
ls -l <name_of_script>
The fourth character in the permissions string that is output by the above command indicates if you can execute your file. If it is not an “x”, type
chmod u+x <name_of_script>
to enable execution and resubmit.