Bugzilla – Bug 78
fix routing of jobs based on nodes spec
Last modified: 2010-08-31 16:23:16 MDT
You need to log in before you can comment on or make changes to this bug.
Created an attachment (id=52) [details] patch for comparing nodes specifications As explained in http://www.supercluster.org/pipermail/torquedev/2010-August/002724.html routing of jobs based on a nodes specification currently does not work. The attached patch changes this behaviour such that routing for a nodes specification -l nodes=x1:ppn=y1+x2:ppn=y2+... is based on the processor count x1*y1+x2+y2+...
This should be probably harmonized with http://www.clusterresources.com/bugzilla/show_bug.cgi?id=67 to work with any resource (the patch already counts all resources in both the nodespec and resource requests).
I am fully aware that this patch is incomplete as far as future plans are concerned, e.g., bug 67 or combined requests like nodes=...+procs=... Thus, the patch repairs only the routing code that is in the current version (and probably almost all previous versions) that is clearly broken with respect to nodes specifications. As such it is needed - we are running into problems with "misrouted" jobs right now. With respect to future plans: I would like to see a "standard" for the grammar first. E.g., the problem I ran into when writing the patch is the a nodes specification like -l nodes=4:ppn=8+n15+procs=15 is actually quite ugly: currently the code uses isdigit to decide whether the entry after the + sign is a node name (like n15) or specifies a number of nodes (like 4). This logic obviously fails when we add something like "+procs=15" to the mix. Currently "procs" would actually be interpreted as a node name. If we do want to allow this we need to parse the nodes specification for procs first and then split the string appropriately. However, the string above is supposed to be equivalent to -l nodes=4:ppn=8+n15 -l procs=15 which actually works with the current code. Should this be the only way of specifying a combined nodes+procs resource? With respect to bug 67: despite the explanations given I am still not clear what the proposed syntax actually means. E.g., -l nodes=1:ppn=2:ncpus=3 Currently, nodes=1:ppn=2 means 1 node, 2 processors per node; ncpus=3 is basically equivalent to nodes=1:ppn=3 (even though this is not handled by torque, schedulers treat it that way and users are used to it). Thus, to me bug 67 changes the meaning of these resources requests. Is that what we want? And if yes, what does nodes=1:ppn=2:ncpus=3 mean? How does the procs resource get incorporated into this scheme? Frankly, I mostly care about nodes=x:ppn=y and procs=z and any combination of those when it comes to requesting processors. I need that to work. I am open to other changes, but currently I do not understand the meaning of the type of requests introduced in bug 67 and therefore have a hard time dealing with them. - Martin
> I am fully aware that this patch is incomplete as far as future plans are > concerned, e.g., bug 67 or combined requests like nodes=...+procs=... > Thus, the patch repairs only the routing code that is in the current > version (and probably almost all previous versions) that is clearly > broken with respect to nodes specifications. As such it is needed - we are > running into problems with "misrouted" jobs right now. My comment wasn't about the "incompleteness" of this patch, but rather about the fact, that the 67 patch already calculates all resources (into the ResourcesTotal job attribute). Which can be then checked when routing. Much better then hacking existing functions to "sort of" support nodespec IMHO. > With respect to future plans: I would like to see a "standard" for the > grammar first. E.g., the problem I ran into when writing the patch is > the a nodes specification like > > -l nodes=4:ppn=8+n15+procs=15 > > is actually quite ugly: currently the code uses isdigit to decide whether > the entry after the + sign is a node name (like n15) or specifies a number > of nodes (like 4). > This logic obviously fails when we add something like "+procs=15" to the > mix. Currently "procs" would actually be interpreted as a node name. If we > do want to allow this we need to parse the nodes specification for procs first > and then split the string appropriately. However, the string above is supposed > to be equivalent to > > -l nodes=4:ppn=8+n15 > -l procs=15 > > which actually works with the current code. Should this be the only way of > specifying a combined nodes+procs resource? Well, yes this is one of the issues that the 67 bug solves. > With respect to bug 67: despite the explanations given I am still not clear > what the proposed syntax actually means. E.g., > -l nodes=1:ppn=2:ncpus=3 That is a decision to be made. It's set using resource flags in the resc_def_all. Current meaning is 1 node, 2 processes, 3 cpus per process. You can easily set it to mean 1 node 2 processes 3 cpus total. There are two approaches I have seen, when it came to ppn vs. ncpus. One is to keep the logic of processes (which is odd and hard to grasp), the second one is to make ppn and ncpus synonymous (which is a bit hard to enforce). > Currently, nodes=1:ppn=2 means 1 node, 2 processors per node; ncpus=3 is > basically equivalent to nodes=1:ppn=3 (even though this is not handled by > torque, schedulers treat it that way and users are used to it). Thus, to me > bug 67 changes the meaning of these resources requests. Is that what we want? > And if yes, what does nodes=1:ppn=2:ncpus=3 mean? How does the procs resource > get incorporated into this scheme? Torques features are a bit behind in this area. Simply not using ppn and ncpus together would be the good approach right now. Either set ppn=1 on each node and set the cpus using ncpus or set ppn=cpus and don't set ncpus on the node. > Frankly, I mostly care about nodes=x:ppn=y and procs=z and any combination > of those when it comes to requesting processors. I need that to work. I am > open to other changes, but currently I do not understand the meaning of the > type of requests introduced in bug 67 and therefore have a hard time dealing > with them. I just realized that nodect and procs are actually not propagated into the resources total attribute (but are still counted). I will look into this. The correct result for -l nodes=x:ppn=y+procs=z would be resources total: nodect=x, procs=x*y+z
Martin, We are not going to accept this patch for a couple of reasons. The first reason is we are planning to address the syntax issue soon. Instead of adding an additional syntax that will likely be overridden shortly we will be better off addressing the global issue. The second reason is that allowing a nodes=x:ppn=y on the resources_max.nodes and resources_min.nodes creates and ambiguity. The solution you have proposed changes the meaning of nodes to processors. While schedulers like Moab will allow nodes and processors to mean the same thing, Moab handles the interpretation. TORQUE still looks at everything as nodes first and processors as a resource on a node. It might be useful to have a way to set queue limits with the nodes=x:ppn=y concept. But we will address that later. Thanks for the patch. The help from the community is vital to the success of the TORQUE project.