[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Tue Dec 7 12:59:46 MST 2010


>>> I do not like "mpiprocs" and "ompthread": there can be "procs" and
>>> "threads" other than "mpi" and "omp". We can use "threads" instead
>>> of "ompthread", but we cannot use "procs" instead of "mpiprocs" -
>>> that is taken already. Maybe we could use "nprocs" instead?
> 
> Personally, I don't care for these terms/nuances either.
> 
> In fact, I don't care for the :ppn=X syntax either, and have disabled
> that specification at our site.  Users simply ask for nodes=X which is
> a misnomer for "slots=X".  The "ppn" number in the nodefile is arbitrary
> and determined by the system administrator.  On one of our clusters,
> we oversubscribe the nodes.  ie, the "ppn" number is greater than the
> number of physical cores.  On other clusters, we "undersubscribe" nodes
> in that they are GPU machines and the "ppn" number is the number of
> GPUs in the machine (not GPU cores).  We use nodesets to create boundaries
> between machines and/or networks, and the user can specify nodes=X:label
> if they care which machine they land on.  The users *must* be aware
> of the machine that they are using to some degree, and the nodes=X:ppn=Y
> syntax is not meaningful when there are GPUs with varying amount of cores,
> CPUs optionally with different number of cores, and the nodes, slots, or
> ncpus does not dictate the network interface that they are on, the amount
> of local disk space (if any) nor the amount of memory on each node.

Interesting approach. We have pretty much identical situation. We have a
heavily heterogeneous grid. But we use exclusively the nodespec for this.

So instead of requesting
nodes=2:ncpus=4:mem=4G+3:ncpus=2:mem=2G#infiniband your users request
nodes=2:ncpus4:mem4G:infinisite1+3:ncpus2:mem2G:infinisite1?

How do you determine the amount of used resources on nodes, or do you
just assign nodes exclusively?

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20101207/8d7dc2aa/attachment.bin 


More information about the torquedev mailing list