[torqueusers] Trouble with prologue script

Joshua Bernstein jbernstein at penguincomputing.com
Mon Feb 9 14:41:26 MST 2009



Tim Miller wrote:
> Hello All,
> 
> I am experimenting with job prologue scripts and am having trouble. On 
> one of my nodes I created a file /var/spool/torque/mom_priv/
> prologue as follows:
> 
> [root at h218 ~]# ls -l /var/spool/torque/mom_priv/prologue
> -r-x------ 1 root root 38 Feb  9 04:04 /var/spool/torque/mom_priv/prologue
> 
> This script is very simple (just prints out a debug message). However, 
> when I submit a job to the test node (-l nodes=h218) it just goes into 
> an infinite requeue loop. Looking at the logs, I get the following:
> 
> Feb  9 04:05:31 h218 pbs_mom: run_pelog, prolog/epilog failed, file: 
> /var/spool/torque/mom_priv/prologue, exit: 255, nonzero p/e exit status
> 
> I am guessing that the 255 exit status corresponds with exit -1, which 
> according to the manual indicates that permissions on the prologue 
> script are not set correctly. I do not understand it since the 
> permission (owner/group of root and permissions 0500) match with what 
> the manual says they ought to be. I am not sure how to debug from here, 
> so any help would be much appreciated.

Does the script exit with a proper return code? Sometimes just printing a debug 
message isn't sufficient. Instead, the safest thing to do is make the script 
exit with an explicit "exit 0".

-Joshua Bernstein
Senior Software Engineer
Penguin Computing



More information about the torqueusers mailing list