[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined

Michael Barnes Michael.Barnes at jlab.org
Tue Dec 7 12:36:18 MST 2010


On Dec 7, 2010, at 1:39 PM, Michel Béland wrote:

>> I do not like "mpiprocs" and "ompthread": there can be "procs" and
>> "threads" other than "mpi" and "omp". We can use "threads" instead
>> of "ompthread", but we cannot use "procs" instead of "mpiprocs" -
>> that is taken already. Maybe we could use "nprocs" instead?

Personally, I don't care for these terms/nuances either.

In fact, I don't care for the :ppn=X syntax either, and have disabled
that specification at our site.  Users simply ask for nodes=X which is
a misnomer for "slots=X".  The "ppn" number in the nodefile is arbitrary
and determined by the system administrator.  On one of our clusters,
we oversubscribe the nodes.  ie, the "ppn" number is greater than the
number of physical cores.  On other clusters, we "undersubscribe" nodes
in that they are GPU machines and the "ppn" number is the number of
GPUs in the machine (not GPU cores).  We use nodesets to create boundaries
between machines and/or networks, and the user can specify nodes=X:label
if they care which machine they land on.  The users *must* be aware
of the machine that they are using to some degree, and the nodes=X:ppn=Y
syntax is not meaningful when there are GPUs with varying amount of cores,
CPUs optionally with different number of cores, and the nodes, slots, or
ncpus does not dictate the network interface that they are on, the amount
of local disk space (if any) nor the amount of memory on each node.

No user specifies ppn here, and we have many different types of nodes
where some of the nodes are used exclusively by one user at a time or
they are shared with multiple users.

Maybe I'm oversimplifying things, but I've never found the :ppn=
specification useful.  I would prefer to have slots=X over nodes=X,
but this may be too radical of a change this late in the game.

-mb

--
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| Scientific Computing Group
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------






More information about the torquedev mailing list