[torqueusers] stage in failures
garrick at usc.edu
Sun Feb 19 23:45:15 MST 2006
On Mon, Feb 13, 2006 at 03:16:23PM -0500, nathaniel.x.woody at gsk.com alleged:
> When a file stage-in fails for some reason (like an ssh failure/timeout),
> Torque puts a Wait on that job by resetting the Execution Time for that
> job 30 minutes in the future. Does anybody know if this is configurable
> in anyway? IE, the number of retries and the amount of time waited until
> it is retried?
It's not configurable. But you can change the 1800 second wait with
PBS_STAGEFAIL_WAIT in src/include/server_limits.h. It retries forever.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060219/25056026/attachment.bin
More information about the torqueusers