[torqueusers] ncpus anyone?

Martin Siegert siegert at sfu.ca
Mon Mar 1 14:41:56 MST 2010


Hi Si, David,

On Mon, Mar 01, 2010 at 09:21:42PM +0000, Si Hammond wrote:
> I have to admit to being pretty confused by the ncpus resource becausei
> I don't seem to be able to get it to run in the way I imagine it would.
> 
> What I'd really like it to mean is just pick me, say, 10 cores and get
> the job run. I don't care about the number of nodes or processors per
> node, job placement etc. 

That's exactly what ncpus does not do: ncpus is a relict (as far as I know)
from old mainframe days - anyway it old works when requesting resources
on a single node (SMP).

What you want to use is -l procs=N which requests, N cores with arbitrary
distribution across nodes. However, you need a scheduler (e.g., moab)
that supports the procs resource.

> On 1 Mar 2010, at 21:18, David Beer wrote:
> 
> > Hi all,
> > 
> > I'm wondering if anyone uses ncpus for TORQUE (in the 2.3 and beyond
> > versions). From looking through the users list's old entries, it seems
> > that a lot of people are confused about this attribute and sometimes
> > just decide to avoid it. From my testing, which I admit isn't extensive,
> > it seems that this attribute is almost completely meaningless. For
> > example, if a job is submitted with -l ncpus=10, it still appears to
> > only run in one place. I see this in pbsnodes -a:
> > 
> > jobs = 0/83.napali
> > 
> > despite the fact that:
> > 
> >    Resource_List.ncpus = 10
> > 
> > appears in qstat -f's output. I'm wondering if there's anyone out
> > there successfully using this feature, because it looks to me that
> > TORQUE doesn't do anything with ncpus in its current state.
> > 
> > Thanks for any light you can shed on this,

We still use ncpus on a bunch of SMP systems. However, it is a complete
nuisance as users do not understand the fact that this work on SMPs
exclusively. Hence we commonly have users submit job scripts on clusters
requesting ncpus - which fail consequently.

I believe that -l ncpus=N can be completely replaced with -l nodes=1:ppn=N.
Hence, in my opinion it would be good to have a configure option

--enable-legacy-ncpus

that would allow building torque without ncpus support.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torqueusers mailing list