[torquedev] [torqueusers] Problem with ppn and routing
"Mgr. Šimon Tóth"
SimonT at mail.muni.cz
Sat Dec 4 02:47:04 MST 2010
>> At some point (I believe 2.5) we added the ability to use resources_max.nodes
>> in queue limitations, but it only sorts based on the number of nodes, not
>> ppn. We couldn't sort based on ppn because of the inherent ambiguities -
>> which is larger, nodes=1:ppn=2 or nodes=2:ppn=1 - so we only sort based on
>> the first number there. This means that a job requesting nodes=1:ppn=2 will
>> be accepted by the batch queue.
> I think this is only partially correct:
> 1) the code in svr_jobfunc.c, lines 1041ff, indeed only checks the first
> number in the nodes specification: isdigit(...)
> 2) However, further down, line 1094, the full strings are compared:
> rc = jbrc->rs_defin->rs_comp(
> e.g., if you set
> set queue serial resources_max.nodes = 1
> in qmgr and then submit a job that requests nodes=1:ppn=1, that job
> will get rejected since rs_comp does a strcmp of "1" and "1:ppn=1".
> 3) Similarly, the comp_resc2 routine uses rs_comp as well, i.e., does
> a strcmp of the nodes resource string.
> This means that 1:ppn=1 < 1:ppn=12 < 1:ppn=2 which does not make any
> Frankly, I think that the routing code based on the nodes string is
> broken. It is probably correct to fix it such that it compares only
> the first digit. However, we (and probably many others) actually use
> the broken code to do the routing:
> we have a need to route serial jobs and multiprocessor jobs differently:
> set queue default queue_type = Route
> set queue default route_destinations = parallel
> set queue default route_destinations += serial
> set queue parallel resources_min.nodes = 1:ppn=2
> set queue serial resources_max.nodes = 1:ppn=1
> This kind of works because so far we only have up to 8 core nodes.
> Thus, only 1:ppn=1 is smaller than 1:ppn=2. However, we will soon
> get our first cluster with 12 core nodes and then the whole scheme
> will break because 1:ppn=12 < 1:ppn=2 .
The problem is that nodes is a special type of resource, that
recursively contains list of resources. Therefore comparing it as
resources doesn't make any sense. How would you handle 1:ppn=1+100:ppn=8 ?
What is waiting in Bugzilla is a support for a new job attribute:
"total_resources" that sums up resources from both the resource list and
nodespec. This is then checked against limits instead of the resource list.
You need to sum up the resources, because otherwise the limits can be
easily broken (ie. instead of -l nodes=1:ppn=8 using -l procs=8).
The patch doesn't propagate ppn (although PPN is counted) because it is
unclear how case where both the nodespec and procs are specified should
Plus the Torque developers still haven't decided on clear resource
semantics therefore -l nodes=2 -l mem=2G could mean both 2GB spread
across 2 nodes or 2GB on each node (4GB total).
Mgr. Šimon Tóth
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20101204/3a971efe/attachment.bin
More information about the torquedev