[torqueusers] Wrong number of allocated nodes
Regina Guilabert Canals
regina.guilabert at uib.es
Thu Jul 26 04:16:49 MDT 2007
Dear TORQUE users,
Without any apparent reason PBS stop allocating the correct number of
nodes yesterday. Now, when we request, for instance, 4 nodes, the job
only gets 1 node assigned.
Let me illustrate it with an example:
megacelula:~> echo "sleep 10" | qsub -l nodes=4:ppn=2
2124.megacelula
megacelula:~> qstat -n1
megacelula:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----
2124.megacelula dfsvhs9 batch STDIN 4209 4 --
-- 01:00 R -- cell2/1+cell2/0
However, when the nodes are requested explicitly:
megacelula:~> echo "sleep 10" | qsub -l nodes=cell2:ppn=2+cell3:ppn=2
+cell4:ppn=2+cell5:ppn=2
2125.megacelula
megacelula:~> qstat -n1
megacelula:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----
2125.megacelula dfsvhs9 batch STDIN -- 4 --
-- 01:00 R -- cell2/1+cell2/0+cell3/1+cell3/0+cell4/1+cell4/0
+cell5/1+cell5/0
We did check all log files but could not find any trace of what is
wrong in the system.
Any idea about how to further diagnose and correct the problem?
The TORQUE PBS server is configured as:
megacelula:~> qmgr -c "p s"
#
# Create queues and set their attributes.
#
#
# Create and define queue debug
#
create queue debug
set queue debug queue_type = Execution
set queue debug Priority = 5
set queue debug max_running = 1
set queue debug resources_max.walltime = 00:10:00
set queue debug resources_available.nodect = 2
set queue debug enabled = True
set queue debug started = True
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_max.walltime = 48:00:00
set queue batch resources_default.walltime = 01:00:00
set queue batch resources_available.nodect = 28
set queue batch max_user_run = 5
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = torque at megacelula
set server operators = torque at megacelula
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.nodect = 28
set server scheduler_iteration = 100
set server node_check_rate = 150
set server tcp_timeout = 6
set server pbs_version = 2.1.8
Many thanks in advance,
Regina and Víctor.
Regina Guilabert Canals
Grup de Meteorologia
Edif. Mateu Orfila Tel: +34 971 17 3213
Universitat de les Illes Balears Fax: +34 971 17 3426
07122 Palma de Mallorca (Spain) email: regina.guilabert at uib.es
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070726/8d0b0497/attachment.html
More information about the torqueusers
mailing list