[torqueusers] Weird queue behavior.

Garrick Staples garrick at usc.edu
Wed Apr 19 22:13:05 MDT 2006


On Wed, Apr 19, 2006 at 09:36:07PM -0600, John Hanks alleged:
> On Wed, 2006-04-19 at 17:31 -0700, Garrick Staples wrote:
> > On Tue, Apr 18, 2006 at 08:52:30AM -0600, John Hanks alleged:
> > > Hi,
> > > 
> > > I have two queues, parallel and dedicated. Parallel is supposed to catch
> > > any job requesting less than nodes=32:ppn=4 and dedicated gets any job
> > > larger than that. But I'm getting weird behavior like this:
> > > 
> > > # A nine node, 4 processor per node job works:
> > > griznog at uinta ~ $ qsub -I -l nodes=9:ppn=4
> > > qsub: waiting for job 10349.uinta.hpc.usu.edu to start
> > > 
> > > # A ten node, 4 ppn job doesn't.
> > > griznog at uinta ~ $ qsub -I -l nodes=10:ppn=4
> > > qsub: Job rejected by all possible destinations
> > > 
> > > # However, a 20 node 2 ppn job does
> > > griznog at uinta ~ $ qsub -I -l nodes=20:ppn=2
> > > qsub: waiting for job 10351.uinta.hpc.usu.edu to start
> > > 
> > > What am I doing wrong here that allows > ~36 CPU jobs unless I pack all
> > > the processors on each node?
> > > 
> > > Queue configuration follows.
> > > 
> > > Thanks,
> > > 
> > > jbh
> > > 
> > > Qmgr: p q parallel
> > > #
> > > # Create queues and set their attributes.
> > > #
> > > #
> > > # Create and define queue parallel
> > > #
> > > create queue parallel
> > > set queue parallel queue_type = Execution
> > > set queue parallel resources_max.nodect = 32
> > > set queue parallel resources_max.nodes = 32:ppn=4
> > > set queue parallel resources_max.walltime = 24:00:00
> > > set queue parallel resources_min.nodect = 1
> > > set queue parallel resources_min.nodes = 1:ppn=2
> > > set queue parallel resources_default.nodes = 1:ppn=2
> > > set queue parallel resources_default.walltime = 01:00:00
> > > set queue parallel resources_available.nodect = 62
> > > set queue parallel resources_available.nodes = 62:ppn=4
> > > set queue parallel max_user_run = 8
> > > set queue parallel enabled = True
> > > set queue parallel started = True
> > > Qmgr: p q dedicated
> > > #
> > > # Create queues and set their attributes.
> > > #
> > > #
> > > # Create and define queue dedicated
> > > #
> > > create queue dedicated
> > > set queue dedicated queue_type = Execution
> > > set queue dedicated resources_max.nodect = 62
> > > set queue dedicated resources_max.nodes = 62:ppn=4
> > > set queue dedicated resources_max.walltime = 08:00:00
> > > set queue dedicated resources_min.nodect = 33
> > > set queue dedicated resources_min.nodes = 33:ppn=4
> > > set queue dedicated resources_default.nodes = 33:ppn=4
> > > set queue dedicated resources_default.walltime = 01:00:00
> > > set queue dedicated resources_available.nodect = 62
> > > set queue dedicated resources_available.nodes = 62
> > > set queue dedicated enabled = True
> > > set queue dedicated started = True
> > 
> > "nodes" is a string, not an integer, therefore it is only useful as a
> > default.  min/max nodes doesn't have any meaning.
> 
> I'm not really clear on what nodes is used for. I have a fuzzy
> recollection of following a discussion here about nodect and nodes and
> arriving at the conclusion that nodes was the better way to specify
> these things. Thinking I'd cover all my bases is why I have them both in
> there for min/max.
> 
> > I don't see a routing queue or your server's default queue here.  Your
> > qsub examples above don't use -q, so I don't know which queue is being
> > used.
> 
> Sorry, there is a queue called batch which routes to these two queues
> and a queue for serial jobs. batch is the default queue.

I think if you just do the below, and remove the extra configs,
it will work as you want:

set server resources_default.nodes = 1:ppn=2
set queue batch route_destinations = "parallel,dedicated"
set queue parallel resources_max.nodect = 32
set queue parallel resources_max.walltime = 24:00:00
set queue dedicated resources_max.nodect = 62
set queue dedicated resources_max.walltime = 08:00:00


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060419/be1ad8b7/attachment.bin


More information about the torqueusers mailing list