[torqueusers] Trouble with prologue script
Joshua Bernstein
jbernstein at penguincomputing.com
Mon Feb 9 14:41:26 MST 2009
Tim Miller wrote:
> Hello All,
>
> I am experimenting with job prologue scripts and am having trouble. On
> one of my nodes I created a file /var/spool/torque/mom_priv/
> prologue as follows:
>
> [root at h218 ~]# ls -l /var/spool/torque/mom_priv/prologue
> -r-x------ 1 root root 38 Feb 9 04:04 /var/spool/torque/mom_priv/prologue
>
> This script is very simple (just prints out a debug message). However,
> when I submit a job to the test node (-l nodes=h218) it just goes into
> an infinite requeue loop. Looking at the logs, I get the following:
>
> Feb 9 04:05:31 h218 pbs_mom: run_pelog, prolog/epilog failed, file:
> /var/spool/torque/mom_priv/prologue, exit: 255, nonzero p/e exit status
>
> I am guessing that the 255 exit status corresponds with exit -1, which
> according to the manual indicates that permissions on the prologue
> script are not set correctly. I do not understand it since the
> permission (owner/group of root and permissions 0500) match with what
> the manual says they ought to be. I am not sure how to debug from here,
> so any help would be much appreciated.
Does the script exit with a proper return code? Sometimes just printing a debug
message isn't sufficient. Instead, the safest thing to do is make the script
exit with an explicit "exit 0".
-Joshua Bernstein
Senior Software Engineer
Penguin Computing
More information about the torqueusers
mailing list