[torqueusers] Problem with ppn and routing

Ken Nielson knielson at adaptivecomputing.com
Mon Nov 29 19:15:19 MST 2010



----- Original Message -----
From: "J.A. Magallón" <jamagallon at ono.com>
To: torqueusers at supercluster.org
Sent: Monday, November 29, 2010 6:16:56 PM
Subject: [torqueusers] Problem with ppn and routing

Hi all...

(I'm new to the list, so hello to everyone...)

I have a test system with a front-end and 2 nodes just to play and learn with
torque/mpi. My setup is very standard:

#
# Create and define queue fast
#
create queue fast
set queue fast queue_type = execution
set queue fast priority = 80
set queue fast max_running = 10
set queue fast resources_min.walltime = 00:00:00
set queue fast resources_max.walltime = 01:00:00
set queue fast resources_max.nodes = 2:ppn=2
set queue fast resources_default.walltime = 01:00:00
set queue fast enabled = true
set queue fast started = true
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = execution
set queue batch priority = 20
set queue batch max_running = 10
set queue batch resources_min.walltime = 01:00:00
set queue batch resources_max.walltime = 48:00:00
set queue batch resources_max.nodes = 1:ppn=1
set queue batch resources_default.walltime = 48:00:00
set queue batch enabled = true
set queue batch started = true
#
# Create and define queue default
#
create queue default
set queue default queue_type = route
set queue default route_destinations = batch
set queue default route_destinations += fast
set queue default enabled = true
set queue default started = true

My idea is the typical 'long jobs can only use one processor'.

With torque 2.4.18, a submission like qsub -l nodes=2:ppn=2 sends the job
to the fast queue, and I know it is limited to 1 hr walltime.
If I need more time:

annwn:~/dev/mpi/tst> qsub -l nodes=2:ppn=2,walltime=10:00:00 k
qsub: Job rejected by all possible destinations

torque tells me I can not use both boxes. 
Problem 1: to fit into queue 'batch' I have just to lower nodes, I can
still leave ppn=2. Is this supposed to work that way ? I thought it
will force me to lower ppn also...

If I upgrade to 2.5.3...
Problem 2: even the simple job fits in any queue:

annwn:~/dev/mpi/tst> qsub -l nodes=2:ppn=2 k
qsub: Job rejected by all possible destinations

I expected to behave like 2.4, it will put the job in 'fast' queue, and
limit it to 1 hr walltime. Even if I ask for few walltime:

annwn:~/dev/mpi/tst> qsub -l nodes=2:ppn=2,walltime=00:10:00 k
qsub: Job rejected by all possible destinations

Any ideas ? What am I doing wrong ?

TIA

-- 
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
_______________________________________________

what is the contents of your nodes file. What did you set np to for this node.

Ken Nielson
Adaptive Computing


More information about the torqueusers mailing list