[torqueusers] Online docs missing queue resource "nodes"
siegert at sfu.ca
Thu Jan 5 17:22:24 MST 2006
On Fri, Jan 06, 2006 at 10:44:20AM +1100, Chris Samuel wrote:
> On Friday 06 January 2006 05:01, Martin Siegert wrote:
> > with JOBNODEMATCHPOLICY unset:
> > 1) "-l nodes=n" requests n processors (not nodes!) anywhere on the cluster
> > regardless of whether the processors are on the same node or not.
> > (however, if you request "-l nodes=n" with n larger than the no. of
> > nodes in your cluster, but smaller than the no. of processors in the
> > cluster, the job is rejected by torque with an error message
> > "Job exceeds queue resource limits". E.g., you cannot request
> > -l nodes=6 on a cluster with 4 dual processor nodes. I suspect that
> > this is a bug).
> You can override this as I bugged David et. al about this for ages and they
> gave in to make me go away. :-)
> So for our 144 CPU Power5 cluster (36 x 4 CPU boxes) I have:
> set server resources_available.nodect = 144
> and Torque no longer complains when I ask for nodes=n where 0 < n < 145.
> In other words with that set I can use nodes purely as a request for a number
> of CPUs in any arrangement.
Aah! Thanks! I did not know this.
> > 2) "-l nodes=n:ppn=1" works in exactly the same way as 1, i.e., you may
> > actually get two processors on the same node. Similarly,
> > "-l nodes=n:ppn=m" means "give me n*m processors with at least m
> > processors on each node".
> I think the behaviour is slightly different (though this may be because we're
> using Moab rather than Maui).
We are using moab as well.
> I believe that you will get n*m CPUs with at *least* m per node, though you
> may have some combination that gives you more than m on some or all nodes.
Sorry probably my mistake, but yes, that is what I meant.
> However, we never get less than 'm' per node, so again on our Power5 cluster a
> user can ask for nodes=1:ppn=4 and be guaranteed to get a single node to
> themselves for an SMP code or some other reason.
> > 3) "-l ncpus=n" requests n processors on a single node, i.e., the same
> > as "-l nodes=1:ppn=n"
> IIRC this has caused interesting behaviours on the rare occasions when one or
> two of our users have tried that, and we never use it ourselves.
We probably can agree on the statement that specifying ncpus on a cluster
is not useful under any (?) circumstances. That's why I believe that the
meaning of ncpus could be safely changed when it comes to clusters.
I really need to be able to support "-l nodes=n:ppn=1" (in the
JOBNODEMATCHPOLICY EXACTNODE sense). We have dual processor nodes with
4GB of memory each. When I have a MPI job that requires 3GB per
process I want to make sure that I do not get more than one process
assigned to a node. However, I do not mind that moab schedules
a process from a different program with memory requirements < 1GB
to the same node.
OTOH I have many users who could not care less whether they get one
or both processors on a node, they just want their MPI job to start
as soon as possible. I.e., for these users "-l nodes=n" (JOBNODEMATCHPOLICY
unset) would be the appropriate setting.
Currently this is not supported.
More information about the torqueusers