[torqueusers] Removing the "exec_host" attribute from a queued job ?

Chris Samuel csamuel at vpac.org
Tue Sep 20 19:47:24 MDT 2005


On Wed, 21 Sep 2005 10:22 am, Garrick Staples wrote:

> Sounds like the initially started job got to the point where it had
> copied the input files for the job before failing.  It's worth
> discovering if the original MS node got a job start commit.

This is what happened on the node when it went bad:

pbs_mom;Svr;pbs_mom;Success (0) in TMomFinalizeJob3, read of pipe for sid failed for job 46533.edda-m.vpac.org (0 of 8 bytes)
pbs_mom;Job;TMomFinalizeJob3;start failed, improper sid
pbs_mom;Job;46533.edda-m.vpac.org;ALERT:  job failed phase 3 start, server will retry
pbs_mom;Req;send_sisters;sending ABORT to sisters

momctl -d 1 doesn't show the mom as thinking it's there, but I've done
a momctl -c 46533 to clear things just in case.

When it tried to restart last time the mom didn't log anything. :-(

cheers,
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050921/246c09b0/attachment.bin


More information about the torqueusers mailing list