[torqueusers] How to find out why a job failed?
Garrick Staples
garrick at clusterresources.com
Tue Jun 27 08:28:49 MDT 2006
On Mon, Jun 26, 2006 at 11:52:50AM -0700, Keenahn Jung alleged:
> Hello, I want to be able to trace the failure of a job. My idea is to
> have the script in the job have different return codes. How can I keep
> track of these return codes after the job fails? I have searched the
> documentation and previous emails and couldn't find anything. This must
> be a problem other have solved before. Thank you!
Enable "keep_completed" at the server or queue level. When jobs exit,
you'll be able to read the "exit_status" job attribute.
Or as Mr. Widyono says, just put the exit code in the job output.
More information about the torqueusers
mailing list