[torqueusers] blocked jobs
Burkhard Bunk
bunk at physik.hu-berlin.de
Fri Feb 22 11:18:49 MST 2013
Hi,
this is not an issue with torque - it looks like a defer problem with
Maui, right?
I can't see why Maui decides to defer (and block) array jobs at all,
just because of the slot limit.
Anyway, Maui's default settings w.r.t. deferring/blocking are pretty
conservative: 24 retries at 1h interval before blocking, i.e.
DEFERSTARTCOUNT 1
DEFERTIME 1:00:00
DEFERCOUNT 24
I had more luck with more generous settings like
DEFERSTARTCOUNT 3
DEFERTIME 0:05:00
DEFERCOUNT 1440
i.e. retry every 5min, for a total of 5 days.
Hope this helps to work around your problem.
Regards,
Burkhard Bunk.
----------------------------------------------------------------------
bunk at physik.hu-berlin.de Physics Institute, Humboldt University
fax: ++49-30 2093 7628 Newtonstr. 15
phone: ++49-30 2093 7980 12489 Berlin, Germany
----------------------------------------------------------------------
On Fri, 22 Feb 2013, Andrus, Brian Contractor wrote:
>
> All,
>
>
>
> We have a user that submits very sizeable array jobs, but also limits the flow
> on them (-t 0-15000%50).
>
> This is fine for a while, but eventually most of his jobs end up blocked. They
> cannot run because the slot limit is reached, so they are deferred. But then
> they are deferred beyond the max defer time and get blocked.
>
> Is there a way to reset the state such that the array elements are back in the
> queue and neither deferred nor blocked? So far I have been iterating through
> them and using qrun to get them going when the resources are available.
>
>
>
> Brian Andrus
>
> ITACS/Research Computing
>
> Naval Postgraduate School
>
> Monterey, California
>
> voice: 831-656-6238
>
>
>
>
>
More information about the torqueusers
mailing list