[torqueusers] pbs_scheduler keeps dying

gianfranco sciacca gs at hep.ucl.ac.uk
Wed Oct 19 10:57:51 MDT 2005

On Tue, 18 Oct 2005, Garrick Staples wrote:
> On Tue, Oct 18, 2005 at 03:58:35PM +0100, gianfranco sciacca alleged:
> > We have been running torque with its stock scheduler for about 7 months 
> > with little problems. All of a sudden, since a few days, the scheduler 
> > keeps dying which is seriously disrupting our cluster operation. I should 
> Can you get a gdb backtrace of it dieing?  Or maybe run it under
> valgrind?

after the last "death" I have restarted it under valgrind. Will let you                                                          
know how it goes.                                                                                                                
Ronny wrote:                                                                                                                     
>  had the same problem - the scheduler was dying within its operation.                                                          
> Try starting pbs_sched with option -a <TIMEOUT>.                                                                               
> This will increase the alarm time for one scheduler run to <TIMEOUT>                                                           
> seconds.  I use -a 600.                                                                                                        
I'll roll this in if it dies again and will let you know.                                                                        

More information about the torqueusers mailing list