AlphaFold on Quest

How to run AlphaFold on Quest

Currently, we have AlphaFold version 2.0.0 and 2.1.1 (with multimers) installed on Quest. For more details on both releases of AlphaFold, please visit the AlphaFold website.

AlphaFold 2.0.0

How is AlphaFold 2.0.0 Installed On Quest?

AlphaFold 2.0.0 is installed inside of a Singularity container following the instructions from the DeepMind team.

The container contains CUDA 11.0, Python 3.7.10, and TensorFlow 2.5.0.

Instead of calling singularity directly, we provide a module which wraps the call to the singularity run.

module load alphafold/2.0.0

This creates a shell function called alphafold which can be used as follows:

alphafold --fasta_paths=/full/path/to/fasta \
--output_dir=/full/path/to/outdir \
--model_names= \
--preset=[full_dbs|casp14] \
--max_template_date=

If you would like to see the contents of the shell function alphafold, you can run type alphafold on the command line.

How do you run AlphaFold on Quest?

Below, we provide an example submission script for running AlphaFold on Quest.

#!/bin/bash
#SBATCH --account=pXXXX  ## YOUR ACCOUNT pXXXX or bXXXX
#SBATCH --partition=gengpu  ### PARTITION (buyin, short, normal, etc)
#SBATCH --nodes=1 ## how many computers do you need - for AlphaFold this should always be one
#SBATCH --ntasks-per-node=12 ## how many cpus or processors do you need on each computer
#SBATCH --gres=gpu:a100:1  ## type of GPU requested, and number of GPU cards to run on
#SBATCH --time=48:00:00 ## how long does this need to run 
#SBATCH --mem=85G ## how much RAM do you need per node (this effects your FairShare score so be careful to not ask for more than you need))
#SBATCH --job-name=run_AlphaFold  ## When you run squeue -u <NETID> this is how you can identify the job
#SBATCH --output=AlphaFold.log ## standard out and standard error goes to this file
#SBATCH --mail-type=ALL ## you can receive e-mail alerts from SLURM when your job begins and when your job finishes (completed, failed, etc)
#SBATCH --mail-user=email@northwestern.edu ## your email, non-Northwestern email addresses may not be supported

#########################################################################
### PLEASE NOTE:                                                      ###
### The above CPU, Memory, and GPU resources have been selected based ###
### on the computing resources that alphafold was tested on           ###
### which can be found here:                                          ###
### https://github.com/deepmind/alphafold#running-alphafold)          ###
### It is likely that you do not have to change anything above        ###
### besides your allocation, and email (if you want to be emailed).   ###
#########################################################################

module purge 
module load alphafold/2.0.0

# template
# alphafold --fasta_paths=/full/path/to/fasta \
#    --output_dir=/full/path/to/outdir \
#    --model_names= \
#    --preset=[full_dbs|casp14] \
#    --max_template_date=

# real example
alphafold --output_dir $HOME/alphafold --fasta_paths=/projects/intro/alphafold/T1050.fasta --max_template_date=2021-07-28 --model_names model_1,model_2,model_3,model_4,model_5 --preset casp14

AlphaFold 2.1.1

How is AlphaFold 2.1.1 Installed On Quest?

AlphaFold 2.1.1 is installed inside of a Singularity container following the instructions from the DeepMind team.

The container contains CUDA 11.0, Python 3.7.10, and TensorFlow 2.5.0.

Instead of calling singularity directly, we provide a module which wraps the call to the singularity run.

module load alphafold/2.1.1

This creates a two shell functions, one for running Alphafold multimer (alphafold-multimer), and one for use with Alpha monomer (alphafold-monomer).

alphafold-monomer --fasta_paths=/full/path/to/fasta \
   --output_dir=/full/path/to/outdir \
   --max_template_date= \
   --model_preset=[monomer|monomer_casp14|monomer_ptm] \
   --db_preset=full_dbs \
  • model_preset
    • monomer: This is the original model used at CASP14 with no ensembling.
    • monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
    • monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwise confidence measure. It is slightly less accurate than the normal monomer model.
alphafold-multimer --fasta_paths=/full/path/to/fasta \
   --output_dir=/full/path/to/outdir \
   --max_template_date= \
   --model_preset=multimer \
   --db_preset=full_dbs \
   --is_prokaryote_list=[true|false]
  • model_preset
    • multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
  • is_prokaryote_list
    • optionally set the --is_prokaryote_list flag with booleans that determine whether all input sequences in the given fasta file are prokaryotic. If that is not the case or the origin is unknown, set to false for that fasta.

If you would like to see the contents of the shell function alphafold, you can run type alphafold on the command line.

How do you run AlphaFold on Quest?

Below, we provide an example submission script for running AlphaFold on Quest.

#!/bin/bash
#SBATCH --account=pXXXX  ## YOUR ACCOUNT pXXXX or bXXXX
#SBATCH --partition=gengpu  ### PARTITION (buyin, short, normal, etc)
#SBATCH --nodes=1 ## how many computers do you need - for AlphaFold this should always be one
#SBATCH --ntasks-per-node=12 ## how many cpus or processors do you need on each computer
#SBATCH --gres=gpu:a100:1  ## type of GPU requested, and number of GPU cards to run on
#SBATCH --time=48:00:00 ## how long does this need to run 
#SBATCH --mem=85G ## how much RAM do you need per node (this effects your FairShare score so be careful to not ask for more than you need))
#SBATCH --job-name=run_AlphaFold  ## When you run squeue -u <NETID> this is how you can identify the job
#SBATCH --output=AlphaFold.log ## standard out and standard error goes to this file
#SBATCH --mail-type=ALL ## you can receive e-mail alerts from SLURM when your job begins and when your job finishes (completed, failed, etc)
#SBATCH --mail-user=email@northwestern.edu ## your email, non-Northwestern email addresses may not be supported

#########################################################################
### PLEASE NOTE:                                                      ###
### The above CPU, Memory, and GPU resources have been selected based ###
### on the computing resources that alphafold was tested on           ###
### which can be found here:                                          ###
### https://github.com/deepmind/alphafold#running-alphafold)          ###
### It is likely that you do not have to change anything above        ###
### besides your allocation, and email (if you want to be emailed).   ###
#########################################################################  module purge

module purge
module load alphafold/2.1.1

# template monomer
# alphafold-monomer --fasta_paths=/full/path/to/fasta \
#    --output_dir=/full/path/to/outdir \
#    --max_template_date= \
#    --model_preset=[monomer|monomer_casp14|monomer_ptm] \
#    --db_preset=full_dbs
###
###         monomer: This is the original model used at CASP14 with no ensembling.
###
###         monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
###
###         monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwise confidence measure. It is slightly less accurate than the normal monomer model.

# template multimer
# alphafold-multimer --fasta_paths=/full/path/to/fasta \
#    --output_dir=/full/path/to/outdir \
#    --max_template_date= \
#    --model_preset=multimer \
#    --is_prokaryote_list=[true|false] \
#    --db_preset=full_dbs
###
###         multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
###
###         optionally set the --is_prokaryote_list flag with booleans that determine whether all input sequences in the given fasta file are prokaryotic. If that is not the case or the origin is unknown, set to false for that fasta.

# real example monomer (takes about 3 hours and 15 minutes)
alphafold-monomer --fasta_paths=/projects/intro/alphafold/T1050.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=monomer \
    --db_preset=full_dbs \
    --output_dir=$(pwd)/out

# real example multimer (takes about 2 hours and 40 minutes)
alphafold-multimer --fasta_paths=/projects/intro/alphafold/6E3K.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=multimer \
    --db_preset=full_dbs  \
    --output_dir=$(pwd)/out

See Also:




Keywords:research computing, quest, alphafold, GPU   Doc ID:112835
Owner:Research Computing .Group:Northwestern
Created:2021-08-04 06:39 CSTUpdated:2022-01-08 12:23 CST
Sites:Northwestern
CleanURL:https://kb.northwestern.edu/alphafold-on-quest
Feedback:  0   0