[torqueusers] Online docs missing queue resource "nodes"

Martin Siegert siegert at sfu.ca
Thu Jan 5 17:22:24 MST 2006


On Fri, Jan 06, 2006 at 10:44:20AM +1100, Chris Samuel wrote:
> On Friday 06 January 2006 05:01, Martin Siegert wrote:
> 
> > with JOBNODEMATCHPOLICY unset:
> >
> > 1) "-l nodes=n" requests n processors (not nodes!) anywhere on the cluster
> >    regardless of whether the processors are on the same node or not.
> >    (however, if you request "-l nodes=n" with n larger than the no. of
> >    nodes in your cluster, but smaller than the no. of processors in the
> >    cluster, the job is rejected by torque with an error message
> >    "Job exceeds queue resource limits". E.g., you cannot request
> >    -l nodes=6 on a cluster with 4 dual processor nodes. I suspect that
> >    this is a bug).
> 
> You can override this as I bugged David et. al about this for ages and they 
> gave in to make me go away. :-)
> 
> So for our 144 CPU Power5 cluster (36 x 4 CPU boxes) I have:
> 
>   set server resources_available.nodect = 144
> 
> and Torque no longer complains when I ask for nodes=n where 0 < n < 145.
> 
> In other words with that set I can use nodes purely as a request for a number 
> of CPUs in any arrangement.

Aah! Thanks! I did not know this.

> > 2) "-l nodes=n:ppn=1" works in exactly the same way as 1, i.e., you may
> >    actually get two processors on the same node. Similarly,
> >    "-l nodes=n:ppn=m" means "give me n*m processors with at least m
> >    processors on each node".
> 
> I think the behaviour is slightly different (though this may be because we're 
> using Moab rather than Maui).

We are using moab as well.

> I believe that you will get n*m CPUs with at *least* m per node, though you 
> may have some combination that gives you more than m on some or all nodes.

Sorry probably my mistake, but yes, that is what I meant.

> However, we never get less than 'm' per node, so again on our Power5 cluster a 
> user can ask for nodes=1:ppn=4 and be guaranteed to get a single node to 
> themselves for an SMP code or some other reason.
> 
> > 3) "-l ncpus=n" requests n processors on a single node, i.e., the same
> >    as "-l nodes=1:ppn=n"
> 
> IIRC this has caused interesting behaviours on the rare occasions when one or 
> two of our users have tried that, and we never use it ourselves.

We probably can agree on the statement that specifying ncpus on a cluster
is not useful under any (?) circumstances. That's why I believe that the
meaning of ncpus could be safely changed when it comes to clusters.

I really need to be able to support "-l nodes=n:ppn=1" (in the
JOBNODEMATCHPOLICY EXACTNODE sense). We have dual processor nodes with
4GB of memory each. When I have a MPI job that requires 3GB per
process I want to make sure that I do not get more than one process
assigned to a node. However, I do not mind that moab schedules
a process from a different program with memory requirements < 1GB
to the same node.
OTOH I have many users who could not care less whether they get one
or both processors on a node, they just want their MPI job to start
as soon as possible. I.e., for these users "-l nodes=n" (JOBNODEMATCHPOLICY
unset) would be the appropriate setting.
Currently this is not supported.

Cheers,
Martin


More information about the torqueusers mailing list