[torqueusers] Online docs missing queue resource "nodes"

Martin Siegert siegert at sfu.ca
Thu Jan 5 11:01:32 MST 2006


On Wed, Jan 04, 2006 at 09:53:27PM -0700, Dave Jackson wrote:
> Bernard,
> 
>   Fixed!  Thanks.  ncpus are processors which are constrained to being
> on the same host (usually).  Over the life of some PBS variants, this
> interpretation has varied.  Maui/Moab do their best to distill the
> meaning based on context.
> 
> Thanks,
> Dave
> 
> On Wed, 2006-01-04 at 20:14 -0800, Bernard Li wrote:
> > In the online docs:
> >  
> > http://www.clusterresources.com/products/torque/docs20/4.1queueconfig.shtml
> >  
> > It mentions that:
> >  
> >  Resources may include one or more of the following: arch, mem, ncpus,
> > nodect, pvmem, and walltime 
> >  
> > Isn't it missing "nodes"?
> >  
> > The example immediately underneath it mentions "default.nodes".
> >  
> > BTW, what's the relationship between resources_default.ncpus and
> > resources_default.nodes?  Is nodes supposed to replace ncpus?

I actually vote for cleaning up the ncpus mess. We just have seen a few
emails about the nodes specification and it is by no means intuitive
either and the meaning furthermore depends on the JOBNODEMATCHPOLICY
setting in moab/maui:

with JOBNODEMATCHPOLICY unset:

1) "-l nodes=n" requests n processors (not nodes!) anywhere on the cluster
   regardless of whether the processors are on the same node or not.
   (however, if you request "-l nodes=n" with n larger than the no. of
   nodes in your cluster, but smaller than the no. of processors in the
   cluster, the job is rejected by torque with an error message
   "Job exceeds queue resource limits". E.g., you cannot request
   -l nodes=6 on a cluster with 4 dual processor nodes. I suspect that
   this is a bug).
2) "-l nodes=n:ppn=1" works in exactly the same way as 1, i.e., you may
   actually get two processors on the same node. Similarly,
   "-l nodes=n:ppn=m" means "give me n*m processors with at least m
   processors on each node".
3) "-l ncpus=n" requests n processors on a single node, i.e., the same
   as "-l nodes=1:ppn=n"

with JOBNODEMATCHPOLICY set to EXACTNODE:

1) "-l nodes=n" requests n processors on n nodes, i.e., exactly one
   processor on each node.
2) "-l nodes=n:ppn=1" is exactly the same as 1.
3) "-l ncpus=n" requests n processors on a single node, i.e., the same
   as "-l nodes=1:ppn=n" (i.e., this is independent of the JOBNODEMATCHPOLICY
   setting).

As a consequence it is impossible to setup torque and moab/maui so that
you have the following functionality:

a) a user can request "ncpus" anywhere on the cluster. I.e., basically
   "-l nodes=n" with JOBNODEMATCHPOLICY unset (if it wouldn't be for
   that bug mentioned above).
b) a user can request exactly n nodes with m processors on each node,
   i.e., the functionality of "-l nodes=n:ppn=m" with JOBNODEMATCHPOLICY
   set to EXACTNODE.

Since a) works only with JOBNODEMATCHPOLICY unset and b) only with
JOBNODEMATCHPOLICY set to EXACTNODE it is impossible to allow both
requests. Furthermore, the meaning of "-l nodes=n" and "-l nodes=n:ppn=1"
with JOBNODEMATCHPOLICY unset and the meaning of "-l ncpus=n" in
general is quite unintuitve (or counterintuitive).

I would like to see the following:

i) "-l ncpus=n" requests n cpus anywhere on the cluster (for a smp
   machine the meaning remains the same). I.e., this would replace
   "-l nodes=n" with JOBNODEMATCHPOLICY unset.
ii) "-l nodes=n:ppn=m" works as it works now with JOBNODEMATCHPOLICY
   set to EXACTNODE. "-l nodes=n" is no longer allowed (or alternatively
   is equivalent to "-l nodes=n:ppn=1").

In order to allow sites to keep the old meaning one could implement
a configure option "--keep-old-ncpus-meaning" for backward compatibility
if this is really required.

Cheers,
Martin

-- 
Martin Siegert
Head, HPC at SFU
WestGrid Site Manager
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torqueusers mailing list