[torqueusers] downing a node via qmgr

Garrick Staples garrick at usc.edu
Thu Sep 22 14:03:49 MDT 2005


On Thu, Sep 22, 2005 at 09:11:09AM +1000, Chris Samuel alleged:
> On Thu, 22 Sep 2005 02:22 am, Stewart.Samuels at sanofi-aventis.com wrote:
> 
> > We currently have a node which is rebooting itself constantly.
> 
> I would strongly suggest that you do not start the pbs_mom automatically on a 
> reboot via init scripts.
> 
> If you've rebooted the node yourself then you should restart it by hand 
> whereas if the node dies and reboots you're probably going to want to 
> investigate.  We do this, and only restart the mom when we've got a better 
> handle on things and think it safe to do so.

Really?  That's what you guys do on your cluster?  That sounds like a
major hastle.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050922/33e62ae5/attachment.bin


More information about the torqueusers mailing list