[torqueusers] How to find out why a job failed?

Garrick Staples garrick at clusterresources.com
Tue Jun 27 08:28:49 MDT 2006


On Mon, Jun 26, 2006 at 11:52:50AM -0700, Keenahn Jung alleged:
> Hello, I want to be able to trace the failure of a job. My idea is to
> have the script in the job have different return codes. How can I keep
> track of these return codes after the job fails? I have searched the
> documentation and previous emails and couldn't find anything. This must
> be a problem other have solved before. Thank you!

Enable "keep_completed" at the server or queue level.  When jobs exit,
you'll be able to read the "exit_status" job attribute.

Or as Mr. Widyono says, just put the exit code in the job output.



More information about the torqueusers mailing list