[torqueusers] stage in failures

Garrick Staples garrick at usc.edu
Sun Feb 19 23:45:15 MST 2006


On Mon, Feb 13, 2006 at 03:16:23PM -0500, nathaniel.x.woody at gsk.com alleged:
> Hi,
> 
> When a file stage-in fails for some reason (like an ssh failure/timeout), 
> Torque puts a Wait on that job by resetting the Execution Time for that 
> job 30 minutes in the future.  Does anybody know if this is configurable 
> in anyway?  IE, the number of retries and the amount of time waited until 
> it is retried? 

It's not configurable.  But you can change the 1800 second wait with
PBS_STAGEFAIL_WAIT in src/include/server_limits.h.  It retries forever.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060219/25056026/attachment.bin


More information about the torqueusers mailing list