[torqueusers] Job with high proc count will not schedule
Jonathan K Shelley
Jonathan.Shelley at inl.gov
Tue Mar 2 17:32:30 MST 2010
I have a 5 node cluster with 112 cores. I just installed torque 2.4.6. It
seems to be working but when I submit the following.
qsub -I -l nodes=32
qsub: waiting for job 551.eos.inel.gov to start
I try a qrun and I get the following:
eos:/opt/torque/sbin # qrun 551
qrun: Resource temporarily unavailable MSG=job allocation request exceeds
currently available cluster nodes, 32 requested, 5 available
551.eos.inel.gov
but it never schedules. I saw in the documentation that I needed to set
the resources_availbale.nodect to a high number so I did.
when I run printserverdb I get:
eos:/opt/torque/sbin # printserverdb
---------------------------------------------------
numjobs: 0
numque: 1
jobidnumber: 552
sametm: 1267574146
--attributes--
total_jobs = 1
state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0 Exiting:0
default_queue = all
log_events = 511
mail_from = adm
query_other_jobs = True
resources_available.nodect = 2048
scheduler_iteration = 600
node_check_rate = 150
tcp_timeout = 6
pbs_version = 2.4.6
next_job_number = 551
net_counter = 3 0 0
eos:/opt/torque/sbin # qmgr -c "p s"
#
# Create queues and set their attributes.
#
#
# Create and define queue all
#
create queue all
set queue all queue_type = Execution
set queue all resources_max.walltime = 672:00:00
set queue all resources_available.nodect = 2048
set queue all enabled = True
set queue all started = True
#
# Set server attributes.
#
set server acl_hosts = eos
set server managers = awm at eos.inel.gov
set server managers += lucads2 at eos.inel.gov
set server managers += poolrl at eos.inel.gov
set server managers += sheljk at eos.inel.gov
set server default_queue = all
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.nodect = 2048
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 552
Any ideas what I need to do to get this working?
Thanks,
Jon Shelley
HPC Software Consultant
Idaho National Lab
Phone (208) 526-9834
Fax (208) 526-0122
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100302/5322891e/attachment.html
More information about the torqueusers
mailing list