[torqueusers] pbs_scheduler keeps dying

gianfranco sciacca gs at hep.ucl.ac.uk
Tue Oct 18 08:58:35 MDT 2005


We have been running torque with its stock scheduler for about 7 months 
with little problems. All of a sudden, since a few days, the scheduler 
keeps dying which is seriously disrupting our cluster operation. I should 
mention that we have been running maui in TEST mode for some weeks and 
everything has been fine for at least 4 weeks, then the problems started 
and I wonder if this can be related to maui. We have 
currently switched off maui, but the scheduler healths hasn't improved. 
I'll append at the end of this mail the content of the sched_out file, 
hoping that someone could point me in the right direction for resolving 
this problem.

cheers,
gianfranco

[root at pc72 sched_priv]# service pbs_server status
pbs_server (pid 16583) is running...
pbs_sched dead but subsys locked


[root at pc72 sched_priv]# cat sched_out 
pbs_selstat failed: 15031
pbs_selstat failed: 15031
pbs_selstat failed: 15031
pbs_selstat failed: 15031
alarm call
alarm call
alarm call
pbs_selstat failed: 15031
pbs_selstat failed: 15031
pbs_selstat failed: 15031
pbs_selstat failed: 15031
pbs_selstat failed: 15031
pbs_selstat failed: 15031
Statque failed: 15031
Problem with creating server data strucutre
pbs_selstat failed: 15031
pbs_selstat failed: 15031



More information about the torqueusers mailing list