[torqueusers] Re: mom segfault in new diag code
garrick at usc.edu
Sun Oct 31 16:08:12 MST 2004
On Mon, Nov 01, 2004 at 09:58:32AM +1100, Chris Samuel alleged:
> On Mon, 1 Nov 2004 09:47 am, Garrick Staples wrote:
> > > We *always* run the mom's with the -p flag for just this reason. :-)
> > And you can reliably not break jobs? ? If I went through the entire cluster
> > and restarted every mom, I know I'll lose half the jobs.
> Ahh, our PBS scripts method for stopping the mom on compute nodes is:
> kill -9
> That way it doesn't get time to think about what it's going to do to the jobs
> running on its node.. ;-)
I must admit, I haven't tested this with kill -9!
Recipe for success: never let pbs_mom die gracefully and always start with -p?
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041031/5e64aa53/attachment.bin
More information about the torqueusers