[torqueusers] request to node mapping
Bogdan Costescu
Bogdan.Costescu at iwr.uni-heidelberg.de
Wed Feb 27 20:55:58 MST 2008
Hi!
I have a problem getting to work something that I would call 'request
to node mapping' similar to the 'queue to node mapping' term used in
the Torque docs... and I'm looking for some clues. The basic idea is
that, after a queue to node mapping is defined using
"resources_default.neednodes", a job which requests a specific
property should be queued in a queue which has assigned nodes with
this property.
The software used is Torque 2.1.10; scheduling is done by Maui
3.2.6p19, but I think that Maui should not be involved.
The cluster is composed of 2 types of nodes; the nodes file contains:
opt001 np=4 myri10g
...
optnode01 np=2 gige
...
The server config contains several queues, but the details shown below
are simplified to only 2 queues and (what I think are) the relevant
settings:
set server default_queue = feed
set queue feed queue_type = Route
set queue feed route_destinations = h2_short_032
set queue feed route_destinations += opt_024
set queue h2_short_032 queue_type = Execution
set queue h2_short_032 from_route_only = True
set queue h2_short_032 resources_max.nodect = 32
set queue h2_short_032 resources_max.walltime = 00:30:00
set queue h2_short_032 resources_min.nodect = 4
set queue h2_short_032 resources_default.neednodes = myri10g
set queue opt_024 queue_type = Execution
set queue opt_024 from_route_only = True
set queue opt_024 resources_max.nodect = 24
set queue opt_024 resources_max.walltime = 120:00:00
set queue opt_024 resources_min.nodect = 1
set queue opt_024 resources_default.neednodes = gige
If I submit a job with:
qsub -I -l nodes=2:ppn=2:gige,walltime=0:30:00
this is queued to opt_024 and executed correctly on 2 of the nodes
with property 'gige'.
If I submit a job with:
qsub -I -l nodes=4:ppn=2:gige,walltime=0:30:00
this is queued to h2_short_032 but never executed as this queue is
only associated with nodes that lack property 'gige'. The reason why
the first example works is that "resources_min.nodect = 4" prevents
the job from entering the h2_short_032 queue; in the second example
this doesn't happen anymore (both number of nodes and walltime fit in
the queue definition) and the job is queued in h2_short_032.
Is there some way of making the second example work, possibly in some
newer version of Torque ?
I have a workaround:
set queue opt_024 from_route_only = False
and then submitting with a queue specification and no node property
specification (as this would be implicitly done by the queue to node
mapping):
qsub -I -l nodes=4:ppn=2,walltime=0:30:00 -q opt_024
but I would prefer to avoid this if at all possible.
--
Bogdan Costescu
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the torqueusers
mailing list