[torqueusers] Weird queue behavior.

John Hanks griznog at engineering.usu.edu
Wed Apr 19 21:36:07 MDT 2006


On Wed, 2006-04-19 at 17:31 -0700, Garrick Staples wrote:
> On Tue, Apr 18, 2006 at 08:52:30AM -0600, John Hanks alleged:
> > Hi,
> > 
> > I have two queues, parallel and dedicated. Parallel is supposed to catch
> > any job requesting less than nodes=32:ppn=4 and dedicated gets any job
> > larger than that. But I'm getting weird behavior like this:
> > 
> > # A nine node, 4 processor per node job works:
> > griznog at uinta ~ $ qsub -I -l nodes=9:ppn=4
> > qsub: waiting for job 10349.uinta.hpc.usu.edu to start
> > 
> > # A ten node, 4 ppn job doesn't.
> > griznog at uinta ~ $ qsub -I -l nodes=10:ppn=4
> > qsub: Job rejected by all possible destinations
> > 
> > # However, a 20 node 2 ppn job does
> > griznog at uinta ~ $ qsub -I -l nodes=20:ppn=2
> > qsub: waiting for job 10351.uinta.hpc.usu.edu to start
> > 
> > What am I doing wrong here that allows > ~36 CPU jobs unless I pack all
> > the processors on each node?
> > 
> > Queue configuration follows.
> > 
> > Thanks,
> > 
> > jbh
> > 
> > Qmgr: p q parallel
> > #
> > # Create queues and set their attributes.
> > #
> > #
> > # Create and define queue parallel
> > #
> > create queue parallel
> > set queue parallel queue_type = Execution
> > set queue parallel resources_max.nodect = 32
> > set queue parallel resources_max.nodes = 32:ppn=4
> > set queue parallel resources_max.walltime = 24:00:00
> > set queue parallel resources_min.nodect = 1
> > set queue parallel resources_min.nodes = 1:ppn=2
> > set queue parallel resources_default.nodes = 1:ppn=2
> > set queue parallel resources_default.walltime = 01:00:00
> > set queue parallel resources_available.nodect = 62
> > set queue parallel resources_available.nodes = 62:ppn=4
> > set queue parallel max_user_run = 8
> > set queue parallel enabled = True
> > set queue parallel started = True
> > Qmgr: p q dedicated
> > #
> > # Create queues and set their attributes.
> > #
> > #
> > # Create and define queue dedicated
> > #
> > create queue dedicated
> > set queue dedicated queue_type = Execution
> > set queue dedicated resources_max.nodect = 62
> > set queue dedicated resources_max.nodes = 62:ppn=4
> > set queue dedicated resources_max.walltime = 08:00:00
> > set queue dedicated resources_min.nodect = 33
> > set queue dedicated resources_min.nodes = 33:ppn=4
> > set queue dedicated resources_default.nodes = 33:ppn=4
> > set queue dedicated resources_default.walltime = 01:00:00
> > set queue dedicated resources_available.nodect = 62
> > set queue dedicated resources_available.nodes = 62
> > set queue dedicated enabled = True
> > set queue dedicated started = True
> 
> "nodes" is a string, not an integer, therefore it is only useful as a
> default.  min/max nodes doesn't have any meaning.

I'm not really clear on what nodes is used for. I have a fuzzy
recollection of following a discussion here about nodect and nodes and
arriving at the conclusion that nodes was the better way to specify
these things. Thinking I'd cover all my bases is why I have them both in
there for min/max.

> I don't see a routing queue or your server's default queue here.  Your
> qsub examples above don't use -q, so I don't know which queue is being
> used.

Sorry, there is a queue called batch which routes to these two queues
and a queue for serial jobs. batch is the default queue.

Thanks,

jbh


More information about the torqueusers mailing list