[torqueusers] queue routing based on mem resource not working properly...
lnieroda at gmail.com
Wed Jul 28 06:17:29 MDT 2010
we have a cluster with 3 groups of machines - some have 24GB, some
have 48GB, another group has 96GB, our maui version is 3.2.6p21.
The general idea is to keep the larger nodes free for jobs that
actually need that much RAM and thus route jobs with >48GB
automatically to the 96GB nodes, >24GB to the 48GB nodes and the rest
to the 24GB nodes.
How this was implemented:
- each node has been given a "property" according to its available
memory, i.e. ram96gb, ram48gb, ram24gb
- there is a queue for each memory size, with appropriate "neednodes"
and "resources_min.mem" statements, i.e.
set queue qram48g resources_default.neednodes = ram48gb
set queue qram48g resources_min.mem = 24gb
- finally, there is a routing queue, which routes the jobs, i.e.
set queue default queue_type = Route
set queue default route_destinations = qram96gb
set queue default route_destinations += qram48gb
set queue default route_destinations += qram24gb
However, this isn't working properly - since the jobs are routed
according to their total mem requirement and not the per node value,
for example a job with "-l nodes=2:ppn=4,mem=50gb" would require 25gb
per node but it is routed to qram48gb since 50gb>48gb. Supplying more
resource limits in the queue setup, like pmem and pvmem doesn't change
this behavior - the jobs are still routed to the larger nodes even
though smaller ones would suffice.
Any ideas, experiences with such routing?
More information about the torqueusers