[torqueusers] PBS unable to execute Job.

Ashley Wright a2.wright at qut.edu.au
Wed Sep 7 20:39:15 MDT 2005


Thanks Chris,

I have increase the loglevel to 3. And one of the messages I get is:

09/08/2005 12:33:50;0001;   pbs_mom;Job;914.auriga.qut.edu.au;phase 2 of 
job launch successfully completed
09/08/2005 12:33:50;0001;   pbs_mom;Job;TMomFinalizeJob3;read start 
return code=-1 session=127
09/08/2005 12:33:50;0001;   pbs_mom;Job;TMomFinalizeJob3;job not 
started, Failure job exec failure, before files staged, no retry
09/08/2005 12:33:50;0001;   pbs_mom;Job;914.auriga.qut.edu.au;ALERT:  
job failed phase 3 start, server will retry
09/08/2005 12:33:50;0008;   pbs_mom;Req;send_sisters;sending ABORT to 
sisters


What is 'phase 3'?  It seems to say this is before the files are staged.
A little furthur on it seems like the files are copied and the job is 
forked:

09/08/2005 12:33:50;0100;   pbs_mom;Req;;Type CopyFiles request received 
from PBS_Server at mgt, sock=10
09/08/2005 12:33:50;0008;   pbs_mom;Job;process_request;request type 
CopyFiles from host mgt allowed
09/08/2005 12:33:50;0004;   pbs_mom;Fil;914.auriga.qut.edu.au;forking to 
user, uid: 1001  gid: 100  homedir: '/home/wright4'


Do you have any other suggestions? I am still new to torque and this 
problem is strange as it was working fine on Monday.

Thanks,
Ashley


Chris Samuel wrote:

>On Wed, 7 Sep 2005 09:13 am, Ashley Wright wrote:
>
>  
>
>>I get the following line in mom_log on node010:
>>09/07/2005 08:59:49;0001;   pbs_mom;Job;TMomFinalizeJob3;job not
>>started, Failure job exec failure, before files staged, no retry
>>09/07/2005 08:59:49;0001;   pbs_mom;Job;889.auriga.qut.edu.au;ALERT:  
>>job failed phase 3 start, server will retry
>>    
>>
>
>I guess you could try bumping up the log level of the mom with SIGUSR1 a few 
>levels (you can see what the current level is by doing momctl -d 1) and then 
>retrying the job.
>
>If that doesn't help then try:
>
> strace -o mom.out -fp <pid-of-mom>
>
>and see what strace shows you.
>
>Good luck!
>Chris
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>torqueusers mailing list
>torqueusers at supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers
>  
>


-- 
Ashley Wright
3864 9264
a2.wright at qut.edu.au
HPC and Research Support Group
Queensland University of Technology (QUT)



More information about the torqueusers mailing list