[torqueusers] PBS unable to execute Job.

Chris Samuel csamuel at vpac.org
Wed Sep 7 20:06:13 MDT 2005


On Wed, 7 Sep 2005 09:13 am, Ashley Wright wrote:

> I get the following line in mom_log on node010:
> 09/07/2005 08:59:49;0001;   pbs_mom;Job;TMomFinalizeJob3;job not
> started, Failure job exec failure, before files staged, no retry
> 09/07/2005 08:59:49;0001;   pbs_mom;Job;889.auriga.qut.edu.au;ALERT:  
> job failed phase 3 start, server will retry

I guess you could try bumping up the log level of the mom with SIGUSR1 a few 
levels (you can see what the current level is by doing momctl -d 1) and then 
retrying the job.

If that doesn't help then try:

 strace -o mom.out -fp <pid-of-mom>

and see what strace shows you.

Good luck!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050908/26b28493/attachment.bin


More information about the torqueusers mailing list