[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined
"Mgr. Šimon Tóth"
SimonT at mail.muni.cz
Tue Dec 7 12:59:46 MST 2010
>>> I do not like "mpiprocs" and "ompthread": there can be "procs" and
>>> "threads" other than "mpi" and "omp". We can use "threads" instead
>>> of "ompthread", but we cannot use "procs" instead of "mpiprocs" -
>>> that is taken already. Maybe we could use "nprocs" instead?
> Personally, I don't care for these terms/nuances either.
> In fact, I don't care for the :ppn=X syntax either, and have disabled
> that specification at our site. Users simply ask for nodes=X which is
> a misnomer for "slots=X". The "ppn" number in the nodefile is arbitrary
> and determined by the system administrator. On one of our clusters,
> we oversubscribe the nodes. ie, the "ppn" number is greater than the
> number of physical cores. On other clusters, we "undersubscribe" nodes
> in that they are GPU machines and the "ppn" number is the number of
> GPUs in the machine (not GPU cores). We use nodesets to create boundaries
> between machines and/or networks, and the user can specify nodes=X:label
> if they care which machine they land on. The users *must* be aware
> of the machine that they are using to some degree, and the nodes=X:ppn=Y
> syntax is not meaningful when there are GPUs with varying amount of cores,
> CPUs optionally with different number of cores, and the nodes, slots, or
> ncpus does not dictate the network interface that they are on, the amount
> of local disk space (if any) nor the amount of memory on each node.
Interesting approach. We have pretty much identical situation. We have a
heavily heterogeneous grid. But we use exclusively the nodespec for this.
So instead of requesting
nodes=2:ncpus=4:mem=4G+3:ncpus=2:mem=2G#infiniband your users request
How do you determine the amount of used resources on nodes, or do you
just assign nodes exclusively?
Mgr. Šimon Tóth
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20101207/8d7dc2aa/attachment.bin
More information about the torquedev