[torqueusers] PBS unable to execute Job.
Chris Samuel
csamuel at vpac.org
Wed Sep 7 20:06:13 MDT 2005
On Wed, 7 Sep 2005 09:13 am, Ashley Wright wrote:
> I get the following line in mom_log on node010:
> 09/07/2005 08:59:49;0001; pbs_mom;Job;TMomFinalizeJob3;job not
> started, Failure job exec failure, before files staged, no retry
> 09/07/2005 08:59:49;0001; pbs_mom;Job;889.auriga.qut.edu.au;ALERT:
> job failed phase 3 start, server will retry
I guess you could try bumping up the log level of the mom with SIGUSR1 a few
levels (you can see what the current level is by doing momctl -d 1) and then
retrying the job.
If that doesn't help then try:
strace -o mom.out -fp <pid-of-mom>
and see what strace shows you.
Good luck!
Chris
--
Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050908/26b28493/attachment.bin
More information about the torqueusers
mailing list