[torqueusers] Automatically resubmitting a failed job

Prakash Velayutham prakash.velayutham at cchmc.org
Tue Dec 8 14:17:57 MST 2009


Hello,

Some of my jobs seem to fail for unknown reasons (application bug  
would be my guess). I can see that the exit_status is non-zero from  
Torque's perspective. I would like to retry these jobs (a max of 3  
attempts) automatically. Is there an easy way to do that? Or do I have  
to create a dummy job that spins idly checking for the real job to  
finish, check its output status and then resubmit if it fails??

Thanks,
Prakash


More information about the torqueusers mailing list