[Mauiusers] priority job failing to get reservation

Roy Dragseth roy.dragseth at cc.uit.no
Wed May 16 14:38:46 MDT 2012


We have used a reservation depth of 3 for years without any noticable 
problems.  Could this be related to a problem with preemption?  We do not use 
preemption in our setup.  As a test it could be worth it to try to turn it off 
and see if the reservations start working.

r.

On Friday 20. April 2012 14.27.57 Naveed Near-Ansari wrote:
> I know this isn't technically torque, but i haven't seen any activity on
> the maui list and I though there might be some overlap in users here.
> 
> I am having an issue with a priority job not getting a reservation. When
> I set reservation depth to 2, the second priority job does get a
> reservation though.
> 
> The cluster has 3552 core available for the queue it is submitted to, at
> the moment they are all in use.  Since the jobs has the highest
> priority, it should start reserving nodes (and it does try.)  When i
> change the RESERVATIONDEPTH to 2, the second highest priority job does
> get a reservation, though this is a much smaller job.  Perhaps I am
> misunderstanding how these reservation work.  If there a timefram in
> which it has to reserve nodes?
> 
> We don't have a size limit on jobs and the cluster does have the
> resources for this job.
> 
> Does anyone know what may be going on here?  We have this type of
> workflow where some people send it very large jobs, and some small so I
> would like to figure out what is happening. Do you have any good
> strategies to deal with the type of workflow?
> 
> Here is the checkjob output and as you can see, it isn't requesting any
> resources other than cores.  I have no idea  where it is getting the
> idle procs from since none are actually idle. perhaps it has do do with
> reservable nodes?  The idle procs tends to fluctuate over time.
> 
> checking job 213152
> 
> State: Idle
> Creds:  user:user  group:group  class:default  qos:dedicated
> WallTime: 00:00:00 of 1:12:00:00
> SubmitTime: Fri Apr  6 03:35:23
>   (Time Queued  Total: 7:45:59  Eligible: 1:30:06)
> 
> Total Tasks: 1501
> 
> Req[0]  TaskCount: 1501  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [default]
> 
> 
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Flags:       RESTARTABLE PREEMPTEE DEDICATEDNODE
> Attr:        PREEMPTEE
> 
> PE:  1501.00  StartPriority:  144235
> job cannot run in partition DEFAULT (insufficient idle procs available:
> 1056 < 1501)
> 
> 
> Here are the relevant log entries:
> 
> 04/06 03:35:24 MJobPReserve(213152,DEFAULT,ResCount,ResCountRej)
> 04/06 03:35:24 INFO:     3552 feasible tasks found for job 213152:0 in
> partition DEFAULT (1501 Needed)
> 04/06 03:35:24 ALERT:    job 213152 cannot run in any partition
> 04/06 03:35:24 ALERT:    cannot create new reservation for job 213152
> (shape[1] 1501)
> 04/06 03:35:24 ALERT:    cannot create new reservation for job 213152
> 04/06 03:35:24 ALERT:    job '213152' cannot run (deferring job for 3600
> seconds)
> 04/06 03:35:24 WARNING:  cannot reserve priority job '213152'
-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
	      phone:+47 77 64 41 07, fax:+47 77 64 41 00
        Roy Dragseth, Team Leader, High Performance Computing
	 Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no



More information about the mauiusers mailing list