[torqueusers] request to node mapping

Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.de
Wed Feb 27 20:55:58 MST 2008


Hi!

I have a problem getting to work something that I would call 'request 
to node mapping' similar to the 'queue to node mapping' term used in 
the Torque docs... and I'm looking for some clues. The basic idea is 
that, after a queue to node mapping is defined using 
"resources_default.neednodes", a job which requests a specific 
property should be queued in a queue which has assigned nodes with 
this property.

The software used is Torque 2.1.10; scheduling is done by Maui 
3.2.6p19, but I think that Maui should not be involved.

The cluster is composed of 2 types of nodes; the nodes file contains:

opt001 np=4 myri10g
...
optnode01 np=2 gige
...

The server config contains several queues, but the details shown below 
are simplified to only 2 queues and (what I think are) the relevant 
settings:

set server default_queue = feed
set queue feed queue_type = Route
set queue feed route_destinations = h2_short_032
set queue feed route_destinations += opt_024
set queue h2_short_032 queue_type = Execution
set queue h2_short_032 from_route_only = True
set queue h2_short_032 resources_max.nodect = 32
set queue h2_short_032 resources_max.walltime = 00:30:00
set queue h2_short_032 resources_min.nodect = 4
set queue h2_short_032 resources_default.neednodes = myri10g
set queue opt_024 queue_type = Execution
set queue opt_024 from_route_only = True
set queue opt_024 resources_max.nodect = 24
set queue opt_024 resources_max.walltime = 120:00:00
set queue opt_024 resources_min.nodect = 1
set queue opt_024 resources_default.neednodes = gige

If I submit a job with:

qsub -I -l nodes=2:ppn=2:gige,walltime=0:30:00

this is queued to opt_024 and executed correctly on 2 of the nodes 
with property 'gige'.

If I submit a job with:

qsub -I -l nodes=4:ppn=2:gige,walltime=0:30:00

this is queued to h2_short_032 but never executed as this queue is 
only associated with nodes that lack property 'gige'. The reason why 
the first example works is that "resources_min.nodect = 4" prevents 
the job from entering the h2_short_032 queue; in the second example 
this doesn't happen anymore (both number of nodes and walltime fit in 
the queue definition) and the job is queued in h2_short_032.

Is there some way of making the second example work, possibly in some 
newer version of Torque ?

I have a workaround:

set queue opt_024 from_route_only = False

and then submitting with a queue specification and no node property 
specification (as this would be implicitly done by the queue to node 
mapping):

qsub -I -l nodes=4:ppn=2,walltime=0:30:00 -q opt_024

but I would prefer to avoid this if at all possible.

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de


More information about the torqueusers mailing list