Bug 116 - enable routing depending on number of requested processors
: enable routing depending on number of requested processors
Status: RESOLVED FIXED
Product: TORQUE
pbs_server
: 2.5.x
: PC Linux
: P5 enhancement
Assigned To: David Beer
:
:
:
  Show dependency treegraph
 
Reported: 2011-03-09 17:26 MST by Martin Siegert
Modified: 2011-03-30 16:15 MDT (History)
3 users (show)

See Also:


Attachments
torque-2.5.5-procct.patch (6.91 KB, patch)
2011-03-09 17:26 MST, Martin Siegert
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description Martin Siegert 2011-03-09 17:26:13 MST
Created an attachment (id=74) [details]
torque-2.5.5-procct.patch

The attached patch creates a new resource "procct" that counts the number of
requested processors in nodes and/or procs requests. This allows configuration
of routing queues depending on the number of requested processors, e.g.,

create queue default
set queue default queue_type = Route
set queue default route_destinations = q1
set queue default route_destinations += qsmall
set queue default route_destinations += qlarge

create queue q1
set queue q1 queue_type = Execution
set queue q1 resources_max.procct = 1

create queue qsmall
set queue qsmall queue_type = Execution 
set queue qsmall resources_max.procct = 128
set queue qsmall resources_min.procct = 2

create queue qlarge
set queue qlarge queue_type = Execution 
set queue qlarge resources_min.procct = 129

set server default_queue = default

For requests of the form -l nodes=x:ppn=y -l procs=z procct is set to x*y+z.
The value is unset after the job has been assigned to a queue, otherwise the
job is not run by moab (I have not tested maui) because moab does not know how
to handle the procct resource.

Furthermore, the environment variable PBS_NP is set to the number of requested
processors for use in submission scripts.

- Martin
Comment 1 Simon Toth 2011-03-09 23:25:47 MST
Why create a new resource if we already have procs? And by the way this was
already implemented (but not accepted into Torque).
Comment 2 Martin Siegert 2011-03-10 12:56:53 MST
procct is not the same as procs, in fact its main purpose is to handle nodes
requests correctly, which is currently not possible, e.g., consider requests of
the form b1:ppn=12+4:ppn=4 which results in procct to be set to 12+16=28. Also,
when setting
resources_min.nodes=1:ppn=2
torque uses strcmp to decide whether the min setting for the queue is larger
than the job request. That works as long as you have nodes with up to 9 cores.
If you have more than that, strcmp causes problems, e.g.,
1:ppn=1 < 1:ppn=12 < 1:ppn=2
Instead of changing the rs_comp function for the nodes resource this patch now
introduces the procct resource, which has the additional advantage that it does
handle the procs resource and combinations of nodes and procs resources as
well.

(and if bug 67 ever gets implemented I am sure that procct can be adapted to
simply use total_resources. However, I cannot wait for that: we will receive 12
core nodes in a few weeks and I need to be able to route serial jobs reliably
to their own queue - q1 in the example).

- Martin
Comment 3 Ken Nielson 2011-03-30 16:15:32 MDT
This patch has been merged into the 2.5-fixes branch and will be available in
the next TORQUE release.