[torqueusers] Wrong number of allocated nodes

Regina Guilabert Canals regina.guilabert at uib.es
Thu Jul 26 04:16:49 MDT 2007


Dear TORQUE users,

Without any apparent reason PBS stop allocating the correct number of  
nodes yesterday. Now, when we request, for instance, 4 nodes, the job  
only gets 1 node assigned.

Let me illustrate it with an example:

megacelula:~> echo "sleep 10" | qsub -l nodes=4:ppn=2
2124.megacelula
megacelula:~> qstat -n1

megacelula:
                                                                     
Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK  
Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- ---  
------ ----- - -----
2124.megacelula      dfsvhs9  batch    STDIN        4209     4  --     
--  01:00 R   --    cell2/1+cell2/0


However, when the nodes are requested explicitly:

megacelula:~> echo "sleep 10" | qsub -l nodes=cell2:ppn=2+cell3:ppn=2 
+cell4:ppn=2+cell5:ppn=2
2125.megacelula
megacelula:~> qstat -n1

megacelula:
                                                                     
Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK  
Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- ---  
------ ----- - -----
2125.megacelula      dfsvhs9  batch    STDIN         --      4  --     
--  01:00 R   --    cell2/1+cell2/0+cell3/1+cell3/0+cell4/1+cell4/0 
+cell5/1+cell5/0


We did check all log files but could not find any trace of what is  
wrong in the system.
Any idea about how to further diagnose and correct the problem?


The TORQUE PBS server is configured as:

megacelula:~> qmgr -c "p s"
#
# Create queues and set their attributes.
#
#
# Create and define queue debug
#
create queue debug
set queue debug queue_type = Execution
set queue debug Priority = 5
set queue debug max_running = 1
set queue debug resources_max.walltime = 00:10:00
set queue debug resources_available.nodect = 2
set queue debug enabled = True
set queue debug started = True
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_max.walltime = 48:00:00
set queue batch resources_default.walltime = 01:00:00
set queue batch resources_available.nodect = 28
set queue batch max_user_run = 5
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = torque at megacelula
set server operators = torque at megacelula
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.nodect = 28
set server scheduler_iteration = 100
set server node_check_rate = 150
set server tcp_timeout = 6
set server pbs_version = 2.1.8

Many thanks in advance,

Regina and Víctor.


Regina Guilabert Canals
Grup de Meteorologia

Edif. Mateu Orfila					Tel: +34 971 17 3213
Universitat de les Illes Balears		Fax: +34 971 17 3426
07122 Palma de Mallorca (Spain) 	email: regina.guilabert at uib.es



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070726/8d0b0497/attachment.html


More information about the torqueusers mailing list