Bugzilla – Bug 116
enable routing depending on number of requested processors
Last modified: 2011-03-30 16:15:32 MDT
You need to log in before you can comment on or make changes to this bug.
Created an attachment (id=74) [details] torque-2.5.5-procct.patch The attached patch creates a new resource "procct" that counts the number of requested processors in nodes and/or procs requests. This allows configuration of routing queues depending on the number of requested processors, e.g., create queue default set queue default queue_type = Route set queue default route_destinations = q1 set queue default route_destinations += qsmall set queue default route_destinations += qlarge create queue q1 set queue q1 queue_type = Execution set queue q1 resources_max.procct = 1 create queue qsmall set queue qsmall queue_type = Execution set queue qsmall resources_max.procct = 128 set queue qsmall resources_min.procct = 2 create queue qlarge set queue qlarge queue_type = Execution set queue qlarge resources_min.procct = 129 set server default_queue = default For requests of the form -l nodes=x:ppn=y -l procs=z procct is set to x*y+z. The value is unset after the job has been assigned to a queue, otherwise the job is not run by moab (I have not tested maui) because moab does not know how to handle the procct resource. Furthermore, the environment variable PBS_NP is set to the number of requested processors for use in submission scripts. - Martin
Why create a new resource if we already have procs? And by the way this was already implemented (but not accepted into Torque).
procct is not the same as procs, in fact its main purpose is to handle nodes requests correctly, which is currently not possible, e.g., consider requests of the form b1:ppn=12+4:ppn=4 which results in procct to be set to 12+16=28. Also, when setting resources_min.nodes=1:ppn=2 torque uses strcmp to decide whether the min setting for the queue is larger than the job request. That works as long as you have nodes with up to 9 cores. If you have more than that, strcmp causes problems, e.g., 1:ppn=1 < 1:ppn=12 < 1:ppn=2 Instead of changing the rs_comp function for the nodes resource this patch now introduces the procct resource, which has the additional advantage that it does handle the procs resource and combinations of nodes and procs resources as well. (and if bug 67 ever gets implemented I am sure that procct can be adapted to simply use total_resources. However, I cannot wait for that: we will receive 12 core nodes in a few weeks and I need to be able to route serial jobs reliably to their own queue - q1 in the example). - Martin
This patch has been merged into the 2.5-fixes branch and will be available in the next TORQUE release.