[torqueusers] Automatically resubmitting a failed job
Prakash Velayutham
prakash.velayutham at cchmc.org
Tue Dec 8 14:17:57 MST 2009
Hello,
Some of my jobs seem to fail for unknown reasons (application bug
would be my guess). I can see that the exit_status is non-zero from
Torque's perspective. I would like to retry these jobs (a max of 3
attempts) automatically. Is there an easy way to do that? Or do I have
to create a dummy job that spins idly checking for the real job to
finish, check its output status and then resubmit if it fails??
Thanks,
Prakash
More information about the torqueusers
mailing list