[torqueusers] Re-executing a qeueued job

David Beer dbeer at adaptivecomputing.com
Thu Dec 26 07:27:06 MST 2013


If you are using Moab or Maui then they will 'defer' jobs that aren't able
to run after a few retries. You probably need to do something like

releasehold <jobid>

to let the scheduler know its okay to retry job execution again. There is
also a parameter to control the amount of time that jobs stay deferred
before they are retried again - DEFERTIME. It defaults to 1 hour.


On Thu, Dec 26, 2013 at 7:18 AM, Mahmood Naderan <nt_mahmood at yahoo.com>wrote:

> Hi,
> I have submitted some jobs however at the time I submitted them, they were
> (and still are) in Q state with this reason
>
> Messages:  cannot start job - RM failure, rc: 15046, msg: 'Resource
> temporarily unavailable MSG=job allocation request exceeds currently
> available cluster nodes, 1 requested, 0 available'
>
> How can I re-execute the job? Maybe the resource was not available at that
> time. I can not delete the jobs and resubmit them because a script has
> generated that.
>
> Any way to *retry* the queued job?
>
> Regards,
> Mahmood
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131226/a85b8a72/attachment.html 


More information about the torqueusers mailing list