[torqueusers] Fwd: ncpus anyone?
dbeer at adaptivecomputing.com
Wed Mar 3 13:51:13 MST 2010
If you set resource_max.nodes=1:ppn=32, does that route things correctly?
I want to assure everyone that as TORQUE changes, every reasonable effort will be made to include support for existing constructs. We're looking to expand the functionality of TORQUE, not abandon existing functionality. Backwards compatibility will be a major concern as we move forward, and we'll continue to be in touch as our ideas become more concrete.
----- "Michel Béland" <michel.beland at rqchp.qc.ca> wrote:
> Gareth.Williams at csiro.au wrote:
> > I agree that ncpus=X should be completely synonymous with
> nodes=1:ppn=X - that would make it worth keeping ncpus. This would be
> complementary to specifying procs (meaning a given core count on any
> available hosts), and provide a more intuitive way of describing SMP
> jobs. In principle you could change the names nodes-ppn/ncpus/procs
> but in practise that would be a major pain for everybody.
> > BTW. I realised far later than I'd like that specifying both ncpus
> and nodes conflicts but does not give an error. We had set a queue
> default resource nodes=1 (we no longer do so) which confounded me
> understanding the issue.
> > I think it would be good if torque would give errors if you specify
> more than one of nodes, ncpus or procs, as the specifications clearly
> conflict. Silently choosing one (nodes - yet keeping info on the
> others) is confusing.
> As I explain earlier, we make sure on our Altix machines that all jobs
> have both -lncpus=n and -lnodes=1:ppn=n, as -lnodes is needed for
> cpusets to work correctly and -lncpus was needed to have qstat show
> number of processors correctly. David announced that qstat will show
> processors asked through -lnodes. That is good, but I realize that
> is another reason to use -lncpus: it is needed to route jobs to the
> right execution queue since we have queue attributes like
> resources_max.ncpus and resources_min.ncpus. There is no way to
> accomplish that with -lnodes on an SMP machine, unless Torque is
> modified so that -lnodes=1:ppn=32 routes to a queue with
> resources_max.ncpus=32 and -lnodes=1:ppn=64 (or even -lnodes=2:ppn=32
> a cluster of SMP machines) routes to a queue with
> resources_min.ncpus=33. In the end, this is pretty similar to having
> both -lncpus and -lnodes defined.
> Michel Béland, analyste en calcul scientifique
> michel.beland at rqchp.qc.ca
> bureau S-250, pavillon Roger-Gaudry (principal), Université de
> téléphone : 514 343-6111 poste 3892 télécopieur : 514 343-2155
> RQCHP (Réseau québécois de calcul de haute performance)
David Beer | Senior Software Engineer
More information about the torqueusers