[torqueusers] safely restarting pbs_mom without killing jobs

Garrick Staples garrick at usc.edu
Sun Oct 31 18:14:47 MST 2004


[subject change for the benefit of furture list searches]

On Mon, Nov 01, 2004 at 11:54:52AM +1100, Chris Samuel alleged:
> On Mon, 1 Nov 2004 11:47 am, Garrick Staples wrote:
> 
> > On Mon, Nov 01, 2004 at 10:16:50AM +1100, Chris Samuel alleged:
> >
> > > On Mon, 1 Nov 2004 10:08 am, Garrick Staples wrote:
> > >
> > > > I must admit, I haven't tested this with kill -9!
> > > >
> > > > Recipe for success: never let pbs_mom die gracefully and always start
> > > > with -p?
> > >
> > > That's what *seems* to work here, caveat emptor.
> > >
> > > Don't blame us if it eats your dog.. :-)
> >
> > Unbelievable! ?That seems to work perfectly!
> 
> Phew, I can start breathing again now..  :-)
> 
> Garrick, thanks for the confirmation that it's not just a fluke here at VPAC!
> 
> Hopefully this will help other Torquies too..

This is terrific.  I've got initscripts that start pbs_mom with -p on boot
(this *will* clear the job if a node reboots during a job), do a normal kill on
machine shutdown (again, clear the job if the machine is going down), but
always use kill -9 and -p any other time.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041031/a03d2560/attachment.bin


More information about the torqueusers mailing list