[torquedev] [Bug 93] Resource management semantics of Torque need to be well defined
glen.beane at gmail.com
Tue Dec 7 13:26:54 MST 2010
On Dec 7, 2010, at 2:36 PM, Michael Barnes <Michael.Barnes at jlab.org> wrote:
> On Dec 7, 2010, at 1:39 PM, Michel Béland wrote:
>>> I do not like "mpiprocs" and "ompthread": there can be "procs" and
>>> "threads" other than "mpi" and "omp". We can use "threads" instead
>>> of "ompthread", but we cannot use "procs" instead of "mpiprocs" -
>>> that is taken already. Maybe we could use "nprocs" instead?
> Personally, I don't care for these terms/nuances either.
> In fact, I don't care for the :ppn=X syntax either, and have disabled
> that specification at our site. Users simply ask for nodes=X which is
> a misnomer for "slots=X". The "ppn" number in the nodefile is arbitrary
> and determined by the system administrator. On one of our clusters,
> we oversubscribe the nodes. ie, the "ppn" number is greater than the
> number of physical cores. On other clusters, we "undersubscribe" nodes
> in that they are GPU machines and the "ppn" number is the number of
> GPUs in the machine (not GPU cores). We use nodesets to create boundaries
> between machines and/or networks, and the user can specify nodes=X:label
> if they care which machine they land on. The users *must* be aware
> of the machine that they are using to some degree, and the nodes=X:ppn=Y
> syntax is not meaningful when there are GPUs with varying amount of cores,
> CPUs optionally with different number of cores, and the nodes, slots, or
> ncpus does not dictate the network interface that they are on, the amount
> of local disk space (if any) nor the amount of memory on each node.
> No user specifies ppn here, and we have many different types of nodes
> where some of the nodes are used exclusively by one user at a time or
> they are shared with multiple users.
> Maybe I'm oversimplifying things, but I've never found the :ppn=
> specification useful. I would prefer to have slots=X over nodes=X,
> but this may be too radical of a change this late in the game.
Rather than overloading nodes, you can now do procs=N for N slots distributed over an arbitrary number of nodes (at least if you use Moab or as of 2.5 pbs_sched, not sure if Maui understands it yet)
We have some users that care about ppn and some that only care about some total number of cores. I usually only care about the total number of slots I have and I'm not so concerned with how many nodes.
Sorry for any typos, I'm replying on my phone
> | Michael Barnes
> | Thomas Jefferson National Accelerator Facility
> | Scientific Computing Group
> | 12000 Jefferson Ave.
> | Newport News, VA 23606
> | (757) 269-7634
> torquedev mailing list
> torquedev at supercluster.org
More information about the torquedev