[torqueusers] pbs_mom request, was Re: PBS_MOM kills running jobs when restarted
garrick at usc.edu
Mon Dec 14 11:37:07 MST 2009
On Mon, Dec 14, 2009 at 12:23:14PM +0100, Bogdan Costescu alleged:
> > Thirdly, if a node does go bad and reboot then it
> > makes diagnosis and troubleshooting a lot easier if
> > the node has no jobs on it.
> If the node is offline-d upon unexpected reboot, it would still remain
> empty and ready for testing.
That's what I do. The init script leaves a /.autopbserror file around (just
like the autofsck mechanism).
If found on boot, the node has been ungracefully rebooted and the node is
I like making sure that pbs_mom is at least _started_ on boot to allow old jobs
the chance to exit.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20091214/39b2639e/attachment.bin
More information about the torqueusers