Bugzilla – Bug 116
enable routing depending on number of requested processors
Last modified: 2011-03-30 16:15:32 MDT
You need to
before you can comment on or make changes to this bug.
Created an attachment (id=74) [details]
The attached patch creates a new resource "procct" that counts the number of
requested processors in nodes and/or procs requests. This allows configuration
of routing queues depending on the number of requested processors, e.g.,
create queue default
set queue default queue_type = Route
set queue default route_destinations = q1
set queue default route_destinations += qsmall
set queue default route_destinations += qlarge
create queue q1
set queue q1 queue_type = Execution
set queue q1 resources_max.procct = 1
create queue qsmall
set queue qsmall queue_type = Execution
set queue qsmall resources_max.procct = 128
set queue qsmall resources_min.procct = 2
create queue qlarge
set queue qlarge queue_type = Execution
set queue qlarge resources_min.procct = 129
set server default_queue = default
For requests of the form -l nodes=x:ppn=y -l procs=z procct is set to x*y+z.
The value is unset after the job has been assigned to a queue, otherwise the
job is not run by moab (I have not tested maui) because moab does not know how
to handle the procct resource.
Furthermore, the environment variable PBS_NP is set to the number of requested
processors for use in submission scripts.
Why create a new resource if we already have procs? And by the way this was
already implemented (but not accepted into Torque).
procct is not the same as procs, in fact its main purpose is to handle nodes
requests correctly, which is currently not possible, e.g., consider requests of
the form b1:ppn=12+4:ppn=4 which results in procct to be set to 12+16=28. Also,
torque uses strcmp to decide whether the min setting for the queue is larger
than the job request. That works as long as you have nodes with up to 9 cores.
If you have more than that, strcmp causes problems, e.g.,
1:ppn=1 < 1:ppn=12 < 1:ppn=2
Instead of changing the rs_comp function for the nodes resource this patch now
introduces the procct resource, which has the additional advantage that it does
handle the procs resource and combinations of nodes and procs resources as
(and if bug 67 ever gets implemented I am sure that procct can be adapted to
simply use total_resources. However, I cannot wait for that: we will receive 12
core nodes in a few weeks and I need to be able to route serial jobs reliably
to their own queue - q1 in the example).
This patch has been merged into the 2.5-fixes branch and will be available in
the next TORQUE release.