[torqueusers] MATLAB and cpusets
gus at ldeo.columbia.edu
Wed Dec 15 15:04:22 MST 2010
Martin Thompson wrote:
> I suspect this is purely a MATLAB problem, but I thought I'd check if
> anyone here has encountered it.
> I have a cluster with CentOS 5.5 installed on each node. I am using
> Torque 2.4.11 configured with --enable-cpuset. Each compute node
> has /dev/cpuset mounted.
> Submitting multithreaded jobs such as code compiled with OpenMP, MKL or
> ATLAS all work as I would expect. These jobs only get access to the
> number of cpu cores requested and they are able to fully utilise them.
> However, MATLAB does not behave as expected. If a MATLAB job requests a
> subset of the available cores on a compute node, say 6 out of 12, then
> it will use those 6 cores if it is the only job running on that node.
> However, if another job was already running on that node, say using the
> other 6 cores, then MATLAB will not use its full allocation of cores.
> In most of my tests, using MATLAB 2008b, 2009b and 2010b, it will only
> use a single core instead of the 6 that are available. Sometimes I have
> seen MATLAB 2009b use 2 cores. The MATLAB test I use to investigate
> this problem is just the multiplication of two random 8000x8000
> Any ideas?
> Many thanks
> torqueusers mailing list
> torqueusers at supercluster.org
Not sure if this will help, but there it goes anyway.
In March I asked MathWorks/Matlab how to control the number of threads
used by Matlab.
Their software engineer was very forthcoming, but I ended up with the
impression that there is no way to have a consistent control of the
number of active threads for all Matlab operations.
It seems to depend on which Matlab functions are being called,
and on different mechanisms that Matlab uses to control
the number of threads.
For instance, (most) Linear Algebra functions can be
controlled by MaxNumCompThreads, which you set
inside Matlab (or the Matlab script).
However, for the FFTs functions, for instance,
you either get one thread only or all the cores/cpus
in the node/computer.
In this case the behavior seems to be selected through
-singleCompThread, a flag that you set when you launch Matlab.
Given this lack of master control on the number of threads,
I asked the users to request a full node
to run Matlab, to launch Matlab in batch mode through a wrapper
with nodisplay, nojvm, no-nothing.
To my joy and relief nobody is currently using Matlab in the cluster anyway.
I wonder if what you described is a side effect of these
multiple mechanisms that Matlab seems to use to control the
number of threads, and perhaps how they interact with the
resources that Torque makes available to each job.
My two cents,
More information about the torqueusers