[torqueusers] Job with high proc count will not schedule
Roman Baranowski
roman at chem.ubc.ca
Tue Mar 2 18:48:52 MST 2010
Dear Jonathan,
You have 5 nodes only so bumping up the resources_availbale.nodect with
qmgr will never work, have you tried
qsub -I -l procs=112
All the best
Roman
On Tue, 2 Mar 2010, Jonathan K Shelley wrote:
> I have a 5 node cluster with 112 cores. I just installed torque 2.4.6. It seems to be working but when
> I submit the following.
>
> qsub -I -l nodes=32
> qsub: waiting for job 551.eos.inel.gov to start
>
> I try a qrun and I get the following:
>
> eos:/opt/torque/sbin # qrun 551
> qrun: Resource temporarily unavailable MSG=job allocation request exceeds currently available cluster
> nodes, 32 requested, 5 available 551.eos.inel.gov
>
> but it never schedules. I saw in the documentation that I needed to set the resources_availbale.nodect
> to a high number so I did.
>
> when I run printserverdb I get:
>
> eos:/opt/torque/sbin # printserverdb
> ---------------------------------------------------
> numjobs: 0
> numque: 1
> jobidnumber: 552
> sametm: 1267574146
> --attributes--
> total_jobs = 1
> state_count = Transit:0 Queued:1 Held:0 Waiting:0 Running:0 Exiting:0
> default_queue = all
> log_events = 511
> mail_from = adm
> query_other_jobs = True
> resources_available.nodect = 2048
> scheduler_iteration = 600
> node_check_rate = 150
> tcp_timeout = 6
> pbs_version = 2.4.6
> next_job_number = 551
> net_counter = 3 0 0
>
> eos:/opt/torque/sbin # qmgr -c "p s"
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue all
> #
> create queue all
> set queue all queue_type = Execution
> set queue all resources_max.walltime = 672:00:00
> set queue all resources_available.nodect = 2048
> set queue all enabled = True
> set queue all started = True
> #
> # Set server attributes.
> #
> set server acl_hosts = eos
> set server managers = awm at eos.inel.gov
> set server managers += lucads2 at eos.inel.gov
> set server managers += poolrl at eos.inel.gov
> set server managers += sheljk at eos.inel.gov
> set server default_queue = all
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server resources_available.nodect = 2048
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server next_job_number = 552
>
> Any ideas what I need to do to get this working?
>
> Thanks,
>
> Jon Shelley
> HPC Software Consultant
> Idaho National Lab
> Phone (208) 526-9834
> Fax (208) 526-0122
>
>
More information about the torqueusers
mailing list