Using MATLAB on Quest

On May 1st, 2019 the Quest scheduler changed from Moab to Slurm. Researchers using Quest will have to update their job submission scripts and use different commands to run under the new scheduler.

Setup and reference instructions for running MATLAB on Quest.

When running MATLAB on Quest, you have the option to make explicit use of parallelization. However, even if you aren't explicitly parallelizing your code, you should be aware of MATLAB's use of multithreading. This is discussed first below. Then, there are instructions for setting up MATLAB to run code in parallel. How you use MATLAB's parallelization will depend on whether you are running on a single node, or on multiple nodes.

Background

Compute nodes on Quest have at least 20 cores, but some architectures (collections of compute nodes) have more than 20 cores. See Quest Technical Specifications.

If you want to use more than 20 cores per node, you can request nodes in a specific partition of Quest with the Slurm option:

#SBATCH --constraint="quest5"

Multithreading

MATLAB has built-in multithreading for some linear algebra and numerical functions. By default, MATLAB will try to use all of the cores on a machine to perform these computations. However, if a job you've submitted to Quest uses more cores than were requested, the job will be cancelled. To avoid this situation, you can start MATLAB with the singleCompThread option to restrict a MATLAB process to a single core:

matlab -singleCompThread

This is the recommended way to run MATLAB on Quest.

However, if you want MATLAB to be able to use multiple cores for these calculations, then you can omit the -singleCompThread option when starting MATLAB and request an entire node for your job. Use the options below in your submission script or sbatch command:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=<numberofcores>
#SBATCH --constraint="<partitionName>"

Example

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --constraint="quest5"

A quest5 node has 24 cores so when you request 24 tasks per node, you effectively reserve all the node for yourself.

It is strongly recommended not to mix multithreading and explicit parallelization within MATLAB to avoid your process either running at reduced efficiency or being cancelled for exceeding allotted resources.

Note: it may be possible to use multithreading with less than a full node with the maxNumCompThreads option in your MATLAB code, but some libraries may not adhere to the limits set with this option, and it may not be respected by MATLAB in the future.

Single Node Parallel MATLAB Jobs on Quest

If you want to use MATLAB's parallelization capabilities with a small number of cores, such that the job will fit on a single node (see limits above), then you do not need to create a parallel profile as shown below. You can use the default "local" profile.

In your MATLAB script (<matlabscript.m> in the example below), use

parpool('local', N)

where N is the number of cores you want to use.

Then create a Slurm submission script for your job. There are more details at Submitting a Job on Quest, but a sample script myjob.sh looks like:

mymatlabjob.sh
#!/bin/bash
#SBATCH -A <allocationID>
#SBATCH -p <partitionName>
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=<N>
#SBATCH --mem-per-cpu=<memorypercore>
#SBATCH -t <hh:mm:ss>

## job commands; <matlabscript> is your MATLAB .m file, specified
## without the .m extension

module load matlab/r2018a
matlab -nosplash -nodesktop -singleCompThread -r <matlabscript>

Where the necessary variables to be filled out are explained as below:

Flag Description
<allocationID> ID of your allocation
<partitionName> The name of the partition (i.e. queue)
<hh:mm:ss> Time required for the job to finish
<N> Number of cores per node
<memorypercore> Required memory per core

Additionally, you can add the following lines to the #SBATCH block in the job submission script to give a name for your job and receive emails when your job's status changes (i.e. begin, end or fail):

#SBATCH -J <jobName>
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=<emailaddress>

Once the submission script is completed, you can submit this script as a batch job using sbatch command:

sbatch myjob.sh

Cross-node Parallel MATLAB Jobs on Quest

Setting up multi-node parallelization for MATLAB 2018 (and above) is different than the one for MATLAB 2015b, 2016a and 2017a on Quest. Please see below for further instructions.

Note that any version before 2015b (i.e. 2015a, 2014b, 2014a and 2013a) is not supported by Slurm scheduler for cross-node parallelization.

Cross-node Parallel MATLAB 2018 Jobs

To run parallel jobs on Quest that use cores across multiple nodes, you need to create a parallel profile.

Log in to Quest with X-forwarding enabled (use the -X option when connecting via ssh or connect with FastX to have X-forwarding enabled by default).

Launch MATLAB on the login node you land on from your home directory:

module load matlab/r2018a
matlab

When MATLAB opens, select the Parallel menu button:

Matlab 2018a parallel menu

In the Parallel menu, select Manage Cluster Profiles.

Matlab 2018a Manage Cluster Profiles Menu

This will open the Cluster Profile Manager window.

Create a Validation Profile

In the Cluster Profile Manager, we will make a test profile to validate our settings.

Click on the Add button in the upper left, then choose Custom and then Slurm

Matlab 2018a Create Custom Slurm Parallel Profile

(If you get a message about needing the Distributed Computing Server, it is safe to ignore it.)

This will create a new profile called SlurmProfile1 which will show up in the Cluster Profile list on the left of the Cluster Profile Manager. Double click on the name to change it. Call this profile something like "multinode-quest-validate."

Then, with this new profile selected from the Cluster Profile list, click the Edit button in the bottom right of the Cluster Profile Manager.

Matlab 2018a Edit Custom Slurm Parallel Profile

In the top option block, set "Number of workers available to cluster NumWorkers" to 4. This is the number of cores/workers that we want to use for this test case. This field sets --ntasks=4 for Slurm.

In the top option block, set "Additional command line arguments for job submission SubmitArguments" to

-A <allocationID> -p <partitionName> -t <hh:mm:ss> --ntasks-per-node=<maxtaskspernode> --mem-per-cpu=<memorypercore> -L mdcsw:<licensecount>

where

Flag Description
<maxtaskspernode> Maximum number of tasks that are allowed to run on a node
<memorypercore> Required memory per core
<NumWorkers> Total number of licenses to be used = NumWorkers
Matlab 2018a slurm example

The example in the image above uses

-A p12345 -p short -t 00:10:00 --ntasks-per-node=2 --mem-per-cpu=3G -L mdcsw:4

If you want to set additional Slurm options, you can do that here as well.

These are our job parameters:

  • -A p12345 is the allocation but you need to use your own allocation ID instead.
  • -p short is selects the short partition for this general access allocation. If you use a buy-in account you should use -p buyin. Some buy-in allocations have special queues and if you are using one of these allocations, please use the appropriate partition name.
  • -t 00:10:00 specifies a 10 minute job.
  • --ntasks-per-node=2 when used with the --ntasks (as already set with NumWorkers field) option, this flag be treated as a maximum count of tasks per node.
  • --mem-per-cpu=3G reserves 3 GB memory for each core that we request.
  • -L mdcsw:4 requests 4 MATLAB Distributed Computing Engine licenses (matching the 4 cores/workers we're using).

Click on Done in the bottom right. Then click the Validate button at the top of the window (with a green check mark) to start the validation tests. All of them should pass. If something fails, then recheck that you entered the above parameters correctly. If the parameters are correct (in particular, that you entered your allocation name and not the example allocation above), but the validation still hasn't passed, contact quest-help@northwestern.edu for assistance.

Matlab 2018a Validate Custom Parallel Profile

Now that everything is validated, make a real profile that to use to run your job.

Create a Job Profile

Right click on the name of the validation profile you just made (multinode-quest-validate) in the Cluster Profile list on the left, and choose duplicate from the menu. This will make a copy of the profile we just created. Rename this profile by double clicking on the name. Choose a name that's descriptive of the parameters for your job, perhaps something like multinode-quest-30cores.

With the new profile selected, click on the Edit button in the bottom right corner of the window. We're going to change some of the parameters we set in the validation profile to correspond to the values you need for you actual job.

In the top option block, set "Number of workers available to cluster NumWorkers" to the total number of cores/workers you want to use. We are setting it to 30 since this is the number of cores we will run our job.

In the "Additional command line arguments for job submission SubmitArguments" change the allocation (-A option) and partition (-p option) to be the correct names of the allocation and queue to use. The partition will depend on the walltime (-t option) or specifications of your allocation. In the validation profile, we set the queue to short because we were running a job less than 4 hours (see Quest Queues for more information) in with a general access allocation.

In the same line, we will need to change the number of requested license in mdcsw: to 30. mdcsw should always equal the "Number of workers available to cluster NumWorkers. --ntasks-per-node will define the maximum number of workers will run on the same node. If we set it to 10, this means at most 10 cores will be utilized in a single node.

Set --mem-per-cpu according to your needs however, be careful not to exceed physical memory on the node. Assuming --ntasks-per-node=10, then 10*mem-per-cpu should be smaller than the total memory of the node.

Finally our setting for the production run looks like below:

-A b1002 -p buyin -t 28:00:00 --ntasks-per-node=10 --mem-per-cpu=2G -L mdcsw:30

When you're finished editing the parameters, click Done. The profile is now ready to use in your job script. You can exit MATLAB if you're done using it.

Submitting your Batch Job

Your MATLAB script file (myscript.m in this example) should include the following command:

parpool('profile-name-here')

where profile-name-here is the name of the parallel profile you created with your actual job parameters in it (the second one we made; in the example above, it was multinode-quest-30cores); the name of the profile is surrounded by single quotes. By default, MATLAB will then use the maximum number of cores you specified in the profile.

You will use MATLAB to submit your job to the cluster instead of using sbatch. Make a shell script (mymatlabjob.sh) like the following:

mymatlabjob.sh
#!/bin/bash

module load matlab/r2018a
matlab -nosplash -nodesktop -singleCompThread -r <myscript> > <log> &
exit

The options given above to MATLAB are needed to run the batch job correctly:

  • -singleCompThread tells MATLAB not to try to use more processors on the node than have been allocated to the job. Without this option, your job may get killed for using more resources than requested.
  • <myscript> is the name of a MATLAB script .m file. Omit the .m file extension.
  • <log> is the name of the log file you want to direct output to.
  • & and exit send the MATLAB job (which is running on a login node) to the background and then exits the script so you get the command prompt back.

Make the mymatlabjob.sh script executable (you only need to do this once per file):

chmod u+x mymatlabjob.sh

You execute this script as:

./mymatlabjob.sh

from a Quest login node (where you land when you log into Quest). This is a shell script you're executing to run MATLAB directly. Then MATLAB will request resources from Quest. You are not submitting this script to Slurm via sbatch.

Output

When you submit a job this way, MATLAB may write a directory with a name like Job1 and several related files in the working directory. You can delete these after the job is done, but you should leave them in place while the job is running. Because of this behavior, you may want to submit your job from a directory where you're OK having these extra files written, and update paths in your script to point to other locations where you keep you actual script files to keep everything separate.

Cross-node Parallel MATLAB 2015b-2017a Jobs

To run parallel jobs on Quest that use cores across multiple nodes, you need to create a parallel profile.

Log in to Quest with X-forwarding enabled (use the -X option when connecting via ssh or connect with FastX to have X-forwarding enabled by default).

We need to set QuestResource variable to include Slurm options using the command below:

export QuestResource='-A <allocaitonID> -p <partitionName> -t <hh:mm:ss> --ntasks-per-node=<maxtaskspernode> --mem-per-cpu=<memorypercore> -L mdcsw:<licensecount>'
where
Flag Description
<allocaitonID> ID of your allocation
<partitionName> The name of the partition (i.e. queue)
<hh:mm:ss> Time required for the job to finish
<maxtaskspernode> Maximum number of tasks that are allowed to run on a node
<memorypercore> Required memory per core
<licensecount> Total number of licenses to be used = NumWorkers

To illustrate, we will use the following command below. If you want to set additional Slurm options, you can do that here as well.

export QuestResource='-A p12345 -p short -t 00:10:00 --ntasks-per-node=2 --mem-per-cpu=3G -L mdcsw:4'

These are our job parameters:

  • -A p12345 is the allocation but you need to use your own allocation ID instead.
  • -p short is selects the short partition for this general access allocation. If you use a buy-in account you should use -p buyin. Some buy-in allocations have special queues and if you are using one of these allocations, please use the appropriate partition name.
  • -t 00:10:00 specifies a 10 minute job.
  • --ntasks-per-node=2 when used with the --ntasks (as already set with NumWorkers field) option, this flag be treated as a maximum count of tasks per node.
  • --mem-per-cpu=3G reserves 3 GB memory for each core that we request.
  • -L mdcsw:4 requests 4 MATLAB Distributed Computing Engine licenses (matching the 4 cores/workers we will be using).

Launch MATLAB on the login node you land on from your home directory:

module load matlab/r2016a
matlab

When MATLAB opens, select the Parallel menu button:

Matlab menu

In the Parallel menu, select Manage Cluster Profiles.

Matlab menu

This will open a new windows Cluster Profile Manager.

Create a Validation Profile

In the Cluster Profile Manager, we will make a test profile to validate our settings.

Click on the Add button in the upper left, then choose Custom and then Generic.

Matlab menu

(If you get a message about needing the Distributed Computing Server, it is safe to ignore it.)

This will create a new profile called GenericProfile1 which will show up in the Cluster Profile list on the left of the Cluster Profile Manager. Double click on the name to change it. Call this profile something like "multinode-quest-validate."

Then, with this new profile selected from the Cluster Profile list, click the Edit button in the bottom right of the Cluster Profile Manager.

Matlab menu

In the top option block, set "Number of workers available to cluster NumWorkers" to 4. This is the number of cores/workers that we want to use for this test case. This field sets --ntasks=4 for Slurm.

Matlab menu

Scroll down to "SUBMIT FUNCTIONS" block and fill "Function called when submitting independent jobs IndependentSubmitFcn" and "Function called when submitting communicating jobs CommunicatingSubmitFcn" fields as show below:

Matlab menu

Scroll further down to "JOBS AND TASK FUNCTIONS" block and fill "Function to query cluster about job state GetJobStateFcn" and "Function to manage cluster when you call delete on a job DeleteJobFcn" fields as shown below:

Matlab menu

Click on Done in the bottom right. Then click the Validate button at the top of the window (with a green check mark) to start the validation tests. All of them should pass. If something fails, then recheck that you entered the above parameters correctly. If the parameters are correct (in particular, that you entered your allocation name and not the example allocation above), but the validation still hasn't passed, contact quest-help@northwestern.edu for assistance.

Matlab menu

Now that everything is validated, make a real profile that to use to run your job.

Create a Job Profile

Right click on the name of the validation profile you just made (multinode-quest-validate) in the Cluster Profile list on the left, and choose duplicate from the menu. This will make a copy of the profile we just created. Rename this profile by double clicking on the name. Choose a name that's descriptive of the parameters for your job, perhaps something like multinode-quest-30cores.

With the new profile selected, click on the Edit button in the bottom right corner of the window. We're going to change some of the parameters we set in the validation profile to correspond to the values you need for you actual job.

In the top option block, set "Number of workers available to cluster NumWorkers" to the total number of cores/workers you want to use. We are setting it to 30 since this is the number of cores we will run our job.

Set fields for "IndependentSubmitFcn", "CommunicatingSubmitFcn", GetJobStateFcn" and "DeleteJobFcn" fields same as described when creating validation profile.

When you're finished editing the parameters, click Done. The profile is now ready to use in your job script. You can exit MATLAB if you're done using it.

Submitting your Batch Job

Your MATLAB script file (myscript.m in this example) should include the following command:

parpool('profile-name-here')

where profile-name-here is the name of the parallel profile you created with your actual job parameters in it (the second one we made; in the example above, it was multinode-quest-30cores); the name of the profile is surrounded by single quotes. By default, MATLAB will then use the maximum number of cores you specified in the profile.

You will use MATLAB to submit your job to the cluster instead of using sbatch. Make a shell script (mymatlabjob.sh) like the following:

mymatlabjob.sh
#!/bin/bash

export QuestResource='-A p12345 -p normal -t 28:00:00 --ntasks-per-node=10 --mem-per-cpu=2G -L mdcsw:30'

module load matlab/r2016a
matlab -nosplash -nodesktop -singleCompThread -r <myscript> > <log> &
exit

When setting QuestResource variable, change the allocation (-A option) and partition (-p option) to be the correct names of the allocation and queue to use. The partition will depend on the walltime (-t option) or specifications of your allocation. In the validation profile, we set the queue to short because we were running a job less than 4 hours (see Quest Queues for more information) in with a general access allocation.

In the same line, we will need to change the number of requested license in mdcsw: to 30. mdcsw should always equal the "Number of workers available to cluster NumWorkers. --ntasks-per-node will define the maximum number of workers will run on the same node. If we set it to 10, this means at most 10 cores will be utilized in a single node.

Set --mem-per-cpu according to your needs however, be careful not to exceed physical memory on the node. Assuming --ntasks-per-node=10, then 10*mem-per-cpu should be smaller than the total memory of the node.

The options given above to MATLAB are needed to run the batch job correctly.

  • -singleCompThread tells MATLAB not to try to use more processors on the node than have been allocated to the job. Without this option, your job may get killed for using more resources than requested.
  • <myscript> is the name of a MATLAB script .m file. Omit the .m file extension.
  • <log> is the name of the log file you want to direct output to.
  • & and exit send the MATLAB job (which is running on a login node) to the background and then exits the script so you get the command prompt back.

Make the mymatlabjob.sh script executable (you only need to do this once per file):

chmod u+x mymatlabjob.sh

You execute this script as:

./mymatlabjob.sh

from a Quest login node (where you land when you log into Quest). This is a shell script you're executing to run MATLAB directly. Then MATLAB will request resources from Quest. You are not submitting this script to Slurm via sbatch.

Output

When you submit a job this way, MATLAB may write a directory with a name like Job1 and several related files in the working directory. You can delete these after the job is done, but you should leave them in place while the job is running. Because of this behavior, you may want to submit your job from a directory where you're OK having these extra files written, and update paths in your script to point to other locations where you keep you actual script files to keep everything separate.

See Also:




Keywords:Quest, matlab, parallel, threading, multithreading, cross-node, cross node, profile, job profile   Doc ID:70716
Owner:Research Computing .Group:Northwestern
Created:2017-02-15 16:34 CDTUpdated:2019-06-06 14:10 CDT
Sites:Northwestern
CleanURL:https://kb.northwestern.edu/quest-matlab
Feedback:  0   0