[torqueusers] Exit_status always 0, a BUG??
Prabhakar R Gudla
gudlap at mail.nih.gov
Wed Nov 21 10:48:05 MST 2012
Hi David,
Thanks for your quick comment. I've gone through Torque's documentation
(sec 2.7) and I understand the exit codes.
I even the example C code, "error.c", listed here:
http://tinyurl.com/d5zebfh
with the same result (i.e., Exit_status=0).
I was looking at the changelogs and I think we should try the 4.1.3 due
to some of the fixes. But I don't see any bug fix/patch for Exit_status
issue.
Could it be our set-up?
Best,
-Prabhakar
On 11/21/12 12:29, David Beer wrote:
> PRG,
>
> TORQUE follows standard unix exit codes, so 0 means success. I will say
> that if you experience problems with 4.0.0, I'd strongly recommend that
> you upgrade to 4.1.3, it is far more stable and has a lot of fixes from
> 4.0.0. This however, doesn't seem like an issue.
>
> David
>
> On Wed, Nov 21, 2012 at 10:21 AM, Prabhakar R Gudla <gudlap at mail.nih.gov
> <mailto:gudlap at mail.nih.gov>> wrote:
>
> Hi,
>
> My apologies if this message gets posted twice.
>
>
> Issue:
> We are trying to get the "exit_status" of our jobs on our cluster
> (CentOS 6.3, x86_64 with Torque 4.0.0 and pbs_sched).
>
> Everything looks fine, except that the "exit_status" is messed up and is
> not what we expect. For instance, take the test PBS job (simple_job.sh)
> with an exit code 11. The job executes fine, we get the expected STDOUT
> and STDERR. However, the "exit_status" code is always "0" either using
> "qstat" or "tracejob".
>
> What could be wrong?
>
>
> $ cat simple_job.sh
> ------------------------------------------------------------------
> #!/bin/bash
> #PBS -N TorqueTest
> #PBS -l nodes=1,walltime=00:01:00
> #PBS -M xx at yyy.com <mailto:xx at yyy.com>
> #PBS -m abe
> #print the time and date
> date
> #wait 10 seconds
> sleep 10
> #print the time and date again
> date
> # Exit code
> exit 11
> ------------------------------------------------------------------
>
> $ qstat -f <jobid>
>
> See qstat_out.txt
>
>
> $ tracejob <jobid>
>
> See tracejob_out.txt
>
> What could be going wrong?
>
> Thanks,
>
> PRG
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> --
> David Beer | Senior Software Engineer
> Adaptive Computing
>
More information about the torqueusers
mailing list