[torqueusers] Interpreting Exit_status in server accounting files

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Jan 10 05:35:30 MST 2006


Hi Jeroen,

Thanks a lot.  Signals 1-31 are defined in /usr/include/asm/signal.h
but then I don't understand "Exit_status" values of 126 and 127,
since there aren't any signals of those values.  Maybe exit status
126 and 127 have some special meaning within Torque ?

Thanks,
Ole

Jeroen van den Muyzenberg wrote:
> The exit status should be (haven't checked) the return from the exec'd
> job. We've had a look at them recently and they do seem to conform to;
> 
>     Exit_status >> 8 # Actual exit value
>     Exit_status & 127 # Signal number if thus killed
>     Exit_status & 128 # True if a core dump happened
> 
> Jeroen
> 
> On Tue, 10 Jan 2006, Ole Holm Nielsen wrote:
> 
>> I'm working on the "pbsacct" accounting package for Torque/PBS
>> and would like to understand the meaning of the "Exit_status"
>> numbers in the server accounting files.  Unfortunately, I
>> haven't been able to find a list of exit status values in the
>> Torque source tree.  Going through some of our accounting files,
>> I find a number of jobs with non-zero "Exit_status" values
>> such as: 1, 126, 127, 139, 143, 265, 271.
>>
>> Question: How do I assign a meaning to these "Exit_status" values
>> so that I can decide whether or not to flag a job termination as OK
>> (or just sort of OK) or as "failed" in the accounting output ?
>> It would also be nice to know if a job exited because of wall or
>> cpu time exceeded.


More information about the torqueusers mailing list