[torqueusers] Job with high proc count will not schedule

Roman Baranowski roman at chem.ubc.ca
Tue Mar 2 18:48:52 MST 2010


 	Dear Jonathan,

You have 5 nodes only so bumping up the resources_availbale.nodect with 
qmgr will never work, have you tried
 	qsub -I -l procs=112

 	All the best
 	Roman


On Tue, 2 Mar 2010, Jonathan K Shelley wrote:

> I have a 5 node cluster with 112 cores. I just installed torque 2.4.6. It seems to be working but when
> I submit the following.
> 
> qsub -I -l nodes=32
> qsub: waiting for job 551.eos.inel.gov to start
> 
> I try a qrun and I get the following:
> 
> eos:/opt/torque/sbin # qrun 551
> qrun: Resource temporarily unavailable MSG=job allocation request exceeds currently available cluster
> nodes, 32 requested, 5 available 551.eos.inel.gov
> 
> but it never schedules. I saw in the documentation that I needed to set the resources_availbale.nodect
> to a high number so I did.
> 
> when I run printserverdb I get:
> 
> eos:/opt/torque/sbin # printserverdb
> ---------------------------------------------------
> numjobs:                0
> numque:         1
> jobidnumber:            552
> sametm:         1267574146
> --attributes--
> total_jobs = 1
> state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0 Exiting:0
> default_queue = all
> log_events = 511
> mail_from = adm
> query_other_jobs = True
> resources_available.nodect = 2048
> scheduler_iteration = 600
> node_check_rate = 150
> tcp_timeout = 6
> pbs_version = 2.4.6
> next_job_number = 551
> net_counter = 3 0 0
> 
> eos:/opt/torque/sbin # qmgr -c "p s"
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue all
> #
> create queue all
> set queue all queue_type = Execution
> set queue all resources_max.walltime = 672:00:00
> set queue all resources_available.nodect = 2048
> set queue all enabled = True
> set queue all started = True
> #
> # Set server attributes.
> #
> set server acl_hosts = eos
> set server managers = awm at eos.inel.gov
> set server managers += lucads2 at eos.inel.gov
> set server managers += poolrl at eos.inel.gov
> set server managers += sheljk at eos.inel.gov
> set server default_queue = all
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server resources_available.nodect = 2048
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server next_job_number = 552
> 
> Any ideas what I need to do to get this working?
> 
> Thanks,
> 
> Jon Shelley
> HPC Software Consultant
> Idaho National Lab
> Phone (208) 526-9834
> Fax (208) 526-0122
> 
>


More information about the torqueusers mailing list