[torqueusers] blocked jobs

Burkhard Bunk bunk at physik.hu-berlin.de
Fri Feb 22 11:18:49 MST 2013


Hi,

this is not an issue with torque - it looks like a defer problem with 
Maui, right?
I can't see why Maui decides to defer (and block) array jobs at all,
just because of the slot limit.

Anyway, Maui's default settings w.r.t. deferring/blocking are pretty
conservative: 24 retries at 1h interval before blocking, i.e.
       DEFERSTARTCOUNT 	1
       DEFERTIME 	1:00:00
       DEFERCOUNT 	24 
I had more luck with more generous settings like
       DEFERSTARTCOUNT   3
       DEFERTIME         0:05:00
       DEFERCOUNT        1440
i.e. retry every 5min, for a total of 5 days.

Hope this helps to work around your problem.

Regards,
Burkhard Bunk.
----------------------------------------------------------------------
  bunk at physik.hu-berlin.de      Physics Institute, Humboldt University
  fax:    ++49-30 2093 7628     Newtonstr. 15
  phone:  ++49-30 2093 7980     12489 Berlin, Germany
----------------------------------------------------------------------

On Fri, 22 Feb 2013, Andrus, Brian Contractor wrote:

> 
> All,
> 
>  
> 
> We have a user that submits very sizeable array jobs, but also limits the flow
> on them (-t 0-15000%50).
> 
> This is fine for a while, but eventually most of his jobs end up blocked. They
> cannot run because the slot limit is reached, so they are deferred. But then
> they are deferred beyond the max defer time and get blocked.
> 
> Is there a way to reset the state such that the array elements are back in the
> queue and neither deferred nor blocked? So far I have been iterating through
> them and using qrun to get them going when the resources are available.
> 
>  
> 
> Brian Andrus
> 
> ITACS/Research Computing
> 
> Naval Postgraduate School
> 
> Monterey, California
> 
> voice: 831-656-6238
> 
>  
> 
> 
>


More information about the torqueusers mailing list