[torquedev] [torqueusers] Problem with ppn and routing

Martin Siegert siegert at sfu.ca
Fri Dec 3 19:47:42 MST 2010


Hi,

(moving this thread to the dev list)

On Tue, Nov 30, 2010 at 09:44:08AM -0700, David Beer wrote:
> 
> 
> ----- Original Message -----
> > -snip-
> > > set queue fast resources_max.nodes = 2:ppn=2
> > -snip-
> > > set queue batch resources_max.nodes = 1:ppn=1
> > 
> > My understanding is that torque can/will only do useful comparisons on
> > numeric fields so the above settings are not meaningful. You might be
> > OK with resources_max.nodect (though that might not be numeric either)
> > but could only filter on the number of nodes not the number of
> > processes requested (and you would need a default nodes=1 which I
> > would prefer not to set so we can use procs as an option...). I don't
> > think this solves your problem but might point you (or others) in the
> > right direction.
> > 
> > -- Gareth
> 
> At some point (I believe 2.5) we added the ability to use resources_max.nodes
> in queue limitations, but it only sorts based on the number of nodes, not
> ppn. We couldn't sort based on ppn because of the inherent ambiguities -
> which is larger, nodes=1:ppn=2 or nodes=2:ppn=1 - so we only sort based on
> the first number there. This means that a job requesting nodes=1:ppn=2 will
> be accepted by the batch queue.

I think this is only partially correct:
1) the code in svr_jobfunc.c, lines 1041ff, indeed only checks the first
   number in the nodes specification: isdigit(...)
2) However, further down, line 1094, the full strings are compared:
        rc = jbrc->rs_defin->rs_comp(
               &cmpwith->rs_value,
               &jbrc->rs_value);
   e.g., if you set
   set queue serial resources_max.nodes = 1
   in qmgr and then submit a job that requests nodes=1:ppn=1, that job
   will get rejected since rs_comp does a strcmp of "1" and "1:ppn=1".
3) Similarly, the comp_resc2 routine uses rs_comp as well, i.e., does
   a strcmp of the nodes resource string.
   This means that 1:ppn=1 < 1:ppn=12 < 1:ppn=2 which does not make any
   sense.

Frankly, I think that the routing code based on the nodes string is
broken. It is probably correct to fix it such that it compares only
the first digit. However, we (and probably many others) actually use
the broken code to do the routing:

we have a need to route serial jobs and multiprocessor jobs differently:

set queue default queue_type = Route
set queue default route_destinations = parallel
set queue default route_destinations += serial

set queue parallel resources_min.nodes = 1:ppn=2
set queue serial resources_max.nodes = 1:ppn=1

This kind of works because so far we only have up to 8 core nodes.
Thus, only 1:ppn=1 is smaller than 1:ppn=2. However, we will soon
get our first cluster with 12 core nodes and then the whole scheme
will break because 1:ppn=12 < 1:ppn=2 .

I need to get this fixed quite urgently and propose the following
scheme:

a) change the node comparison such that it compares only the first
   digit. I am not sure whether this breaks moab, because moab does
   assign jobs to classes according to processor count, i.e.,
   moab uses the product N=x*y of a nodes=x:ppn=y specification.
   Alternatively, the nodes comparison code could be left as is in
   its broken state and advice all users not to use resources_min.nodes
   and resources_max.nodes in queue definitions.
b) introduce a new resource "np" (or procct or whatever) that refers
   to the total number of processors (cores) requested. We then can set
   set queue parallel resources_min.np = 2
   set queue serial resources_max.np = 1
   np is determined from the job request: np = x*y + z where x, y is
   from the nodes request nodes=x:ppn=y and z is from the procs
   request procs=z.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torquedev mailing list