[torqueusers] ncpus anyone?

David Beer dbeer at adaptivecomputing.com
Mon Mar 1 15:19:32 MST 2010


So, if I understand correctly, ncpus really only works for people that are running SMP or similar systems? It seems like we definitely need to update our documentation as I feel it is misleading on the matter. Among other things, it seems that a clarification needs to be made that ncpus isn't compatible with the nodes attribute.

On a related note, in the qstat -a output we have the TSK field, which I believe is meant to mean task (I couldn't find anything about it in the man page, the variable in the code is named tasks). I noticed that in the implementation we're just writing whatever value is stored in ncpus for this field. It seems like this could be made more accurate by checking the nodes attribute as well and using that value where it is defined, since it seems to override ncpus when both are present. What are you're thoughts on this?

Thanks for all of your input/explanations. I appreciate you sharing your knowledge with me.

David

----- "Martin Siegert" <siegert at sfu.ca> wrote:

> Hi Si, David,
> 
> On Mon, Mar 01, 2010 at 09:21:42PM +0000, Si Hammond wrote:
> > I have to admit to being pretty confused by the ncpus resource
> becausei
> > I don't seem to be able to get it to run in the way I imagine it
> would.
> > 
> > What I'd really like it to mean is just pick me, say, 10 cores and
> get
> > the job run. I don't care about the number of nodes or processors
> per
> > node, job placement etc. 
> 
> That's exactly what ncpus does not do: ncpus is a relict (as far as I
> know)
> from old mainframe days - anyway it old works when requesting
> resources
> on a single node (SMP).
> 
> What you want to use is -l procs=N which requests, N cores with
> arbitrary
> distribution across nodes. However, you need a scheduler (e.g., moab)
> that supports the procs resource.
> 
> > On 1 Mar 2010, at 21:18, David Beer wrote:
> > 
> > > Hi all,
> > > 
> > > I'm wondering if anyone uses ncpus for TORQUE (in the 2.3 and
> beyond
> > > versions). From looking through the users list's old entries, it
> seems
> > > that a lot of people are confused about this attribute and
> sometimes
> > > just decide to avoid it. From my testing, which I admit isn't
> extensive,
> > > it seems that this attribute is almost completely meaningless.
> For
> > > example, if a job is submitted with -l ncpus=10, it still appears
> to
> > > only run in one place. I see this in pbsnodes -a:
> > > 
> > > jobs = 0/83.napali
> > > 
> > > despite the fact that:
> > > 
> > >    Resource_List.ncpus = 10
> > > 
> > > appears in qstat -f's output. I'm wondering if there's anyone out
> > > there successfully using this feature, because it looks to me
> that
> > > TORQUE doesn't do anything with ncpus in its current state.
> > > 
> > > Thanks for any light you can shed on this,
> 
> We still use ncpus on a bunch of SMP systems. However, it is a
> complete
> nuisance as users do not understand the fact that this work on SMPs
> exclusively. Hence we commonly have users submit job scripts on
> clusters
> requesting ncpus - which fail consequently.
> 
> I believe that -l ncpus=N can be completely replaced with -l
> nodes=1:ppn=N.
> Hence, in my opinion it would be good to have a configure option
> 
> --enable-legacy-ncpus
> 
> that would allow building torque without ncpus support.
> 
> Cheers,
> Martin
> 
> -- 
> Martin Siegert
> Head, Research Computing
> WestGrid Site Lead
> IT Services                                phone: 778 782-4691
> Simon Fraser University                    fax:   778 782-4242
> Burnaby, British Columbia                  email: siegert at sfu.ca
> Canada  V5A 1S6

-- 
David Beer | Senior Software Engineer
Adaptive Computing



More information about the torqueusers mailing list