[torqueusers] Do I have to define the ncpus for a compute node?

André Gemünd andre.gemuend at scai.fraunhofer.de
Sat Jan 14 08:47:07 MST 2012


Are you by any chance using Maui or some other external Scheduler? I think its suspicious that you can run ppn=3, equaling your node count. Perhaps your scheduler allocates seperate nodes.

Greetings
André

----- Ursprüngliche Mail -----
> 
> Thanks Gareth. I removed that setting, using
> 
> 
> qmgr -c 'unset queue batch resources_default.nodes'
> 
> 
> but I'm still getting the same error. I can submit jobs that request
> 1-3 ppn, but not 4 ppn.
> 
> 
> 
> 
> 
> 
> On Sat, Jan 14, 2012 at 5:08 AM, <Gareth.Williams at csiro.au> wrote:
> 
> 
> 
> 
> 
> 
> Hi Ryan,
> 
> 
> 
> Unset queue batch resources_default.nodes – you don’t need that.
> 
> 
> 
> The nodes resource is fighting with the procs resource. You need to
> only set one or the other for a given job (neither is OK for serial
> tasks).
> 
> 
> 
> Gareth
> 
> 
> 
> 
> 
> 
> From: Ryan Golhar [mailto: ngsbioinformatics at gmail.com ]
> Sent: Saturday, 14 January 2012 4:31 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Do I have to define the ncpus for a
> compute node?
> 
> 
> 
> 
> 
> So that's what's throwing me off. I already configured the queue
> using:
> 
> 
> 
> 
> 
> [root at bic database]# qmgr -c 'create queue batch'
> 
> [root at bic database]# qmgr -c 'set queue batch queue_type = execution'
> 
> [root at bic database]# qmgr -c 'set queue batch started = true'
> 
> [root at bic database]# qmgr -c 'set queue batch enabled = true'
> 
> [root at bic database]# qmgr -c 'set queue batch
> resources_default.nodes=1:ppn=1'
> 
> 
> 
> [root at bic database]# qmgr -c "set queue batch keep_completed=120"
> 
> [root at bic database]# qmgr -c "set server default_queue=batch"
> 
> [root at bic database]# qmgr -c "set server query_other_jobs = true"
> 
> 
> 
> 
> 
> I assumed, by default, if the user doesn't specify any resources, a
> job would consume 1 core on 1 node. My nodes file shows:
> 
> 
> 
> 
> 
> [root at bic hg19]# cat /var/spool/torque/server_priv/nodes
> 
> 
> compute-0-0 np=8
> 
> 
> compute-0-1 np=8
> 
> 
> compute-0-2 np=8
> 
> 
> 
> 
> 
> So Torque knows there are 8 cpus per node, and I haven't set a
> maximum limit to how many resources a job could use. To me,
> requesting 2 cpus on 1 node should have succeeded.
> 
> 
> 
> 
> 
> 
> On Fri, Jan 13, 2012 at 11:18 AM, Axel Kohlmeyer <
> akohlmey at cmm.chem.upenn.edu > wrote:
> 
> 
> 
> On Fri, Jan 13, 2012 at 10:59 AM, Ryan Golhar
> < ngsbioinformatics at gmail.com > wrote:
> > Hi - I have a ROCKS cluster running and installed Torque. I'm able
> > to
> > submit 1 core, 1 cpu jobs without problem. I tried submitting a job
> > that
> > requested 4 cpus on 1 node using
> > 
> > #PBS -l nodes=1:ppn=4
> > 
> > in my job submission script. When I submit the job however, I get
> > the
> > error:
> > 
> > qsub: Job exceeds queue resource limits MSG=cannot locate feasible
> > nodes
> > (nodes file is empty or requested nodes exceed all systems)
> > 
> > If I run anodes, I see:
> > 
> > compute-0-0
> > state = free
> > np = 8
> > ntype = cluster
> > status =
> > rectime=1326469800,varattr=,jobs=,state=free,netload=1720539412488,gres=,loadave=0.01,ncpus=8,physmem=16431248kb,availmem=17311704kb,totmem=17451364kb,idletime=339141,nusers=0,nsessions=?
> > 15201,sessions=? 15201,uname=Linux compute-0-0.local
> > 2.6.18-238.19.1.el5 #1
> > SMP Fri Jul 15 07:31:24 EDT 2011 x86_64,opsys=linux
> > gpus = 0
> > 
> > 
> > All my compute nodes have 8 cpus. Do I need to tell Torque this? I
> > thought
> > Torque could figure this out from np=8 or ncpus=8.
> 
> the error message says that the request exceeds the queue
> configuration.
> that is being checked before it looks at any nodes. thus you probably
> have
> to adjust the queue configuration.
> 
> axel.
> 
> 
> > 
> > Ryan
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> > 
> 
> 
> 
> --
> Dr. Axel Kohlmeyer akohlmey at gmail.com
> http://sites.google.com/site/akohlmey/
> 
> Institute for Computational Molecular Science
> Temple University, Philadelphia PA, USA.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

-- 
André Gemünd
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemuend at scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend


More information about the torqueusers mailing list