On Tue, 18 Oct 2005, Garrick Staples wrote:
> On Tue, Oct 18, 2005 at 03:58:35PM +0100, gianfranco sciacca alleged:
> > We have been running torque with its stock scheduler for about 7 months
> > with little problems. All of a sudden, since a few days, the scheduler
> > keeps dying which is seriously disrupting our cluster operation. I should
>
> Can you get a gdb backtrace of it dieing? Or maybe run it under
> valgrind?
after the last "death" I have restarted it under valgrind. Will let you
know how it goes.
Ronny wrote:
> had the same problem - the scheduler was dying within its operation.
> Try starting pbs_sched with option -a <TIMEOUT>.
> This will increase the alarm time for one scheduler run to <TIMEOUT>
> seconds. I use -a 600.
I'll roll this in if it dies again and will let you know.
cheers,
gianfranco