Skip to content

Using the Big ICM Cluster

M2vaine edited this page Sep 30, 2014 · 8 revisions

You are tired of the limited capacity of your computer? Have your matlab program run on the big ICM cluster instead !

To do this you need to use the parallel toolbox with Matlab 2014.

Once you have the parallel toolbox installed on your PC, you anly have to declare the ICM cluster as a profile

Creating the Cluster Profile

  1. Open matlab. On your "Home" tab click on "Parallel" (bottom second to the right)
  2. Click on manage Profile, you should at least see "local" (this is the 8 processors of your computer) in the left column "Cluster Profile"
  3. Click on "Add" (top left), it creates a profile. You just have to edit the first two cases : give a name to this cluster (for instance "Calcul-icm") in the Description and write "calcul1-icm" in the Hostname case.

Click on validate, everything should turn green (if it doesn't contact Philippe Domineaux). Note that this profile has a name (by default it will probably be MJSProfile1), this will be usefull to launch your jobs or to open interactive session.

In this menu you can also configurate your default options: when you use a parallel function such as batch or parpool which profile will be used by default (it can either be your local computer or the ICM-cluster). It may also be a good idea to change the default range of workers that can be used with your profile (for the ICM cluster it may be [1 inf], replace inf by a finite, reasonable number, 50 for instance).


Be gentle when using common ressources !

The cluster is used by different team, and importantly not with the parallel toolbox only. For the other uses of the cluster the reservation/priorities are handled by slurm. Matlab with the parallel toolbox is not yet integrated in this, so if you launch a job it may slow down something that is currently running.

To see if something is running on the workers we use (openmp1 and openmp2), you can look at the current activity and reservation using putty.

[Section under construction]


Basics on how to use the parallel toolbox

There are several way you can send work to the cluster, we will focus on two of them the interactive session and the batch/job. These two modes both can be used with the parfor loop. We start by recalling how it works and then describe how to use the interactive mode and the batch mode.

Parfor:

Using parfor is a way to parallelize your computation. Basically, it operates independently the computation of each iteration of a "for" loop on different processors. Since the computation are done simultaneously and independently, your code should be adpated to it. Here are two scenarios of why you might want to use parfor.

1 Data analysis: You want to analyse the data of an behavioral/fMRI study (for standart fMRI analysis parallelized batches that do not require the parallel toolbox will be available soon). Your code already uses a parfor loop over the 20 subjects. Instead of doing the analysis sequentially you want to do them simultaneously using the ICM. It is not a problem since they are independent.

Initial code

%data is a big matrix with the subjects in column
for sub=1:20
    Res.An(1).Sub(sub)=myfunAnalysis1(data(:,sub));
    Res.An(2).Sub(sub)=myfunAnalysis2(data(:,sub));
end

To be able to use the parallel toolbox you cannot just change the "for" into a "parfor" in this example: there are several problems in the code

  • "data" is a sliced input variable (different slices of the matrix are sent to the different workers) the running index of the parfor loop should be the first dimension (the subjects should be entered on line and not on column).

  • The output variable Res is also sliced but not correctly used, again the indexing should be at the first level.

Parfor code

%data is a big matrix with the subjects in column
data2=data'; %data2 is correctly sliced
parfor sub=1:20
   Sub(sub).An(1).res=myfunAnalysis1(data2(sub,:));
   Sub(sub).An(2).res=myfunAnalysis2(data2(sub,:));
end

2 MCMC simulations: you want to make plenty of simulations with different values of parameters. All your simulations are independent.

Initial code

par1range=1:10;
par2range=1:100;
for par1=par1range
  for par2=par2range
    Par1(par1).Par2(par2).res= myfunsimu(par1,par2);
  end
end

Here the trick is that you can't put a "parfor" inside a "for" or a "parfor", if you want to parallelize as much as possible you need to change the structure of your code:

Parfor code

par1range=1:10;
par2range=1:100;
k=1; % index for total number of parameter values combinations
for par1=par1range
  for par2=par2range
      Valuepar(k).Par1=par1;
      Valuepar(k).Par2=par2;
      k=k+1;
  end
end
NCombi=k-1; %should be equal to length(par1range) * length(par2range)
parfor k=1:NCombi
      Res(k).simu= myfunsimu( Valuepar(k).Par1,Valuepar(k).Par2)
end

Interactive session: parpool

(equivalent to matlabpool in R2013)

An easy way to launch a script with a parfor loop inside is to open an interactive session with the parpool command. You need to specify the profile and the number of workers you want to use, e.g.: parpool( 'MJSProfile1',20) You can give a name to this session, by default it is "gcp".

When you're done with the interactive session you can close it with delete(gcp). If your session is inactive for more than 30 minutes, it will be closed automatically (which is a good thing).

Once you have open your parpool session, you can run your (parfor) scripts.It may not work the first time, because the workers don't have access to the relevant function files. You can add files individually addAttachedFiles(gcp, {'myfun1.m','myfun2.m'...}) or add an entire folder addAttachedFiles(gcp,'C\...\VBA-toolbox'). However be carefull not to add heavy files (with lots of data inside...) this will be slow or even result in an error.

Batch mode

This is probably the best practice when using a cluster with many users. You basically create a job by specifying a script to run (parallelized or not), the profile and the number of workers you want to use. As soon as the workers are free to execute your job, it will run. The great thing about jobs is that you can continue using your matlab session in the meantime.

job = batch('myscript','Profile','MJSProfile1','Pool',20,'AttachedFiles',{'myfun1.m','myfun2.m'});
wait(job);%wait until job is finished
load(job); % load the workspace once the job is finished into your workspace

You can also use the parcluster function:

c=parcluster('MJSProfile1'); % if no input, uses the default profile
job = batch(c,'myscript','Pool',20,'AttachedFiles',{'myfun1.m','myfun2.m'});
wait(job);%wait until job is finished
load(job); % load the workspace once the job is finished into your workspace

NB: Once you've ran wait(job), you won't be able to use your matlab session until the job is done.

You can also use the batch mode with a function, j = batch(myfun1,N,{x1, ..., xn}) where N is the number of output arguments and x1,...,xn are the n inputs; then use r=fetchoutput(job) to create a cell array (of size N) with the outputs.

Once you have imported the results into your session you can delete the job delete(job).

Clone this wiki locally