[Mauiusers] Multiple job request peculiarities

Angel de Vicente angelv at iac.es
Mon Apr 11 04:52:52 MDT 2011


Hi,

last month there was a thread about a problem with jobs requesting 
multimple nodes: 
http://www.supercluster.org/pipermail/mauiusers/2011-March/004608.html

I just found that in our setting there is also a problem with this:

Maui version 3.2.6p21
Torque verion: 2.4.11

Our cluster has 16 nodes, each with 2 CPUs, and 3 nodes, each with 16 CPUs.

Ideally I would like to submit a job to the whole cluster, and if there 
are no other jobs running, then the following does work OK:

[angelv at diodo ~]$ qsub -lnodes=16:ppn=2:cpus2+3:ppn=16:cpus16 runparallel.sh

But if the cluster is busy (as it is right now), then some multi-node 
jobs go to the deferred state instead of to the Idle state, and never 
get executed. For example:


A request for a 3 CPU job is accepted, and it goes into the Idle queue 
(though wrongly reporting that it requires only 1 CPU)

[angelv at diodo ~]$ qsub -lnodes=1:ppn=1:cpus2+1:ppn=2:cpus16 runparallel.sh

88313.diodo.ll.iac.es
[angelv at diodo ~]$ showq -i | grep angelv
              88313        4000      1.0  -    angelv   angelv      1 
   1:00:00   default  Mon Apr 11 11:49:30
[angelv at diodo ~]$ showq | grep angelv
88313                angelv       Idle     1     1:00:00  Mon Apr 11 
11:49:30


If I ask for 17 CPUs, then teh job goes into the "Deferred" state:

[angelv at diodo ~]$ qsub -lnodes=1:ppn=1:cpus2+1:ppn=16:cpus16 runparallel.sh
88314.diodo.ll.iac.es
[angelv at diodo ~]$ showq | grep angelv
88313                angelv       Idle     1     1:00:00  Mon Apr 11 
11:49:30
88314                angelv   Deferred     1     1:00:00  Mon Apr 11 
11:50:07


But there are plenty of cpus16 resources (though busy right now), and I 
can submit withouth issues a job requesting for 48 CPUs, but not as a 
multi-node job:

[angelv at diodo ~]$ qsub -lnodes=3:ppn=16:cpus16 runparallel.sh
88315.diodo.ll.iac.es
[angelv at diodo ~]$ showq | grep angelv
88315                angelv       Idle    48     1:00:00  Mon Apr 11 
11:50:24
88313                angelv       Idle     1     1:00:00  Mon Apr 11 
11:49:30
88314                angelv   Deferred     1     1:00:00  Mon Apr 11 
11:50:07
[angelv at diodo ~]$


Any ideas?

Thanks,
Ángel de Vicente
-- 
http://www.iac.es/galeria/angelv/

High Performance Computing Support PostDoc
Instituto de Astrofísica de Canarias
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



More information about the mauiusers mailing list