[torqueusers] How to find out why a job failed?

Keenahn Jung keenahn at tellme.com
Tue Jun 27 11:42:35 MDT 2006

Thank you for your quick replies! They were very helpful. However, I
want the scheduler to be smart enough to act on the different return
codes. For example, for an exit status of X, reschedule the job
immediately, for exit status Y, alert an admin etc. Should I put this
logic in the epilogue script?

Thanks, K

-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Garrick
Sent: Tuesday, June 27, 2006 7:29 AM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] How to find out why a job failed?

On Mon, Jun 26, 2006 at 11:52:50AM -0700, Keenahn Jung alleged:
> Hello, I want to be able to trace the failure of a job. My idea is to
> have the script in the job have different return codes. How can I keep
> track of these return codes after the job fails? I have searched the
> documentation and previous emails and couldn't find anything. This
> be a problem other have solved before. Thank you!

Enable "keep_completed" at the server or queue level.  When jobs exit,
you'll be able to read the "exit_status" job attribute.

Or as Mr. Widyono says, just put the exit code in the job output.

torqueusers mailing list
torqueusers at supercluster.org

More information about the torqueusers mailing list