[torqueusers] Automatically resubmitting a failed job

Prakash Velayutham prakash.velayutham at cchmc.org
Tue Dec 8 14:17:57 MST 2009


Some of my jobs seem to fail for unknown reasons (application bug  
would be my guess). I can see that the exit_status is non-zero from  
Torque's perspective. I would like to retry these jobs (a max of 3  
attempts) automatically. Is there an easy way to do that? Or do I have  
to create a dummy job that spins idly checking for the real job to  
finish, check its output status and then resubmit if it fails??


More information about the torqueusers mailing list