[torqueusers] Do I have to define the ncpus for a compute node?

Ryan Golhar ngsbioinformatics at gmail.com
Sat Jan 14 11:12:10 MST 2012


I only did it as a test.  I'm using Torque and nothing else...I can submit
jobs requiring 1, 2, and 3 cores.  4 cores doesn't work...

2012/1/14 André Gemünd <andre.gemuend at scai.fraunhofer.de>

> Are you by any chance using Maui or some other external Scheduler? I think
> its suspicious that you can run ppn=3, equaling your node count. Perhaps
> your scheduler allocates seperate nodes.
>
> Greetings
> André
>
> ----- Ursprüngliche Mail -----
> >
> > Thanks Gareth. I removed that setting, using
> >
> >
> > qmgr -c 'unset queue batch resources_default.nodes'
> >
> >
> > but I'm still getting the same error. I can submit jobs that request
> > 1-3 ppn, but not 4 ppn.
> >
> >
> >
> >
> >
> >
> > On Sat, Jan 14, 2012 at 5:08 AM, <Gareth.Williams at csiro.au> wrote:
> >
> >
> >
> >
> >
> >
> > Hi Ryan,
> >
> >
> >
> > Unset queue batch resources_default.nodes – you don’t need that.
> >
> >
> >
> > The nodes resource is fighting with the procs resource. You need to
> > only set one or the other for a given job (neither is OK for serial
> > tasks).
> >
> >
> >
> > Gareth
> >
> >
> >
> >
> >
> >
> > From: Ryan Golhar [mailto: ngsbioinformatics at gmail.com ]
> > Sent: Saturday, 14 January 2012 4:31 AM
> > To: Torque Users Mailing List
> > Subject: Re: [torqueusers] Do I have to define the ncpus for a
> > compute node?
> >
> >
> >
> >
> >
> > So that's what's throwing me off. I already configured the queue
> > using:
> >
> >
> >
> >
> >
> > [root at bic database]# qmgr -c 'create queue batch'
> >
> > [root at bic database]# qmgr -c 'set queue batch queue_type = execution'
> >
> > [root at bic database]# qmgr -c 'set queue batch started = true'
> >
> > [root at bic database]# qmgr -c 'set queue batch enabled = true'
> >
> > [root at bic database]# qmgr -c 'set queue batch
> > resources_default.nodes=1:ppn=1'
> >
> >
> >
> > [root at bic database]# qmgr -c "set queue batch keep_completed=120"
> >
> > [root at bic database]# qmgr -c "set server default_queue=batch"
> >
> > [root at bic database]# qmgr -c "set server query_other_jobs = true"
> >
> >
> >
> >
> >
> > I assumed, by default, if the user doesn't specify any resources, a
> > job would consume 1 core on 1 node. My nodes file shows:
> >
> >
> >
> >
> >
> > [root at bic hg19]# cat /var/spool/torque/server_priv/nodes
> >
> >
> > compute-0-0 np=8
> >
> >
> > compute-0-1 np=8
> >
> >
> > compute-0-2 np=8
> >
> >
> >
> >
> >
> > So Torque knows there are 8 cpus per node, and I haven't set a
> > maximum limit to how many resources a job could use. To me,
> > requesting 2 cpus on 1 node should have succeeded.
> >
> >
> >
> >
> >
> >
> > On Fri, Jan 13, 2012 at 11:18 AM, Axel Kohlmeyer <
> > akohlmey at cmm.chem.upenn.edu > wrote:
> >
> >
> >
> > On Fri, Jan 13, 2012 at 10:59 AM, Ryan Golhar
> > < ngsbioinformatics at gmail.com > wrote:
> > > Hi - I have a ROCKS cluster running and installed Torque. I'm able
> > > to
> > > submit 1 core, 1 cpu jobs without problem. I tried submitting a job
> > > that
> > > requested 4 cpus on 1 node using
> > >
> > > #PBS -l nodes=1:ppn=4
> > >
> > > in my job submission script. When I submit the job however, I get
> > > the
> > > error:
> > >
> > > qsub: Job exceeds queue resource limits MSG=cannot locate feasible
> > > nodes
> > > (nodes file is empty or requested nodes exceed all systems)
> > >
> > > If I run anodes, I see:
> > >
> > > compute-0-0
> > > state = free
> > > np = 8
> > > ntype = cluster
> > > status =
> > >
> rectime=1326469800,varattr=,jobs=,state=free,netload=1720539412488,gres=,loadave=0.01,ncpus=8,physmem=16431248kb,availmem=17311704kb,totmem=17451364kb,idletime=339141,nusers=0,nsessions=?
> > > 15201,sessions=? 15201,uname=Linux compute-0-0.local
> > > 2.6.18-238.19.1.el5 #1
> > > SMP Fri Jul 15 07:31:24 EDT 2011 x86_64,opsys=linux
> > > gpus = 0
> > >
> > >
> > > All my compute nodes have 8 cpus. Do I need to tell Torque this? I
> > > thought
> > > Torque could figure this out from np=8 or ncpus=8.
> >
> > the error message says that the request exceeds the queue
> > configuration.
> > that is being checked before it looks at any nodes. thus you probably
> > have
> > to adjust the queue configuration.
> >
> > axel.
> >
> >
> > >
> > > Ryan
> > >
> > > _______________________________________________
> > > torqueusers mailing list
> > > torqueusers at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/torqueusers
> > >
> >
> >
> >
> > --
> > Dr. Axel Kohlmeyer akohlmey at gmail.com
> > http://sites.google.com/site/akohlmey/
> >
> > Institute for Computational Molecular Science
> > Temple University, Philadelphia PA, USA.
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
>
> --
> André Gemünd
> Fraunhofer-Institute for Algorithms and Scientific Computing
> andre.gemuend at scai.fraunhofer.de
> Tel: +49 2241 14-2193
> /C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120114/b7a95767/attachment.html 


More information about the torqueusers mailing list