[torqueusers] Scheduler keeps restarting!

Leandro Tavares Carneiro leandro at ep.petrobras.com.br
Mon Oct 25 10:49:38 MDT 2004


Hi,

	I have about 500 dual Xeon nodes and i was running OpenPBS, when i got 
several problems with large nodes, using about 400 CPUs then i changed 
to torque, now 1.1.0p3, and everything is going nice now, but, the 
scheduler is restarting a lot when jobs enter in queue. When this 
happens, sometimes, i have to manually do a 'qrun' for the jobs run.

	When i migrate from OpenPBS i dont touched on anything on my 
configuration, and it is working, but maybe i have missed something. The 
  extract of the log of the sched is: Note, i'm using fifo.

10/25/2004 11:49:55;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched 
startup pid 24693
10/25/2004 11:53:13;0002; pbs_sched;Svr;toolong;alarm call
10/25/2004 11:53:13;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 11:53:13;0002; pbs_sched;Svr;toolong;restart dir / object 
/usr/local/sbin/pbs_sched
10/25/2004 11:53:13;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 11:53:13;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched 
startup pid 25198
10/25/2004 11:53:16;0040; pbs_sched;Job;1366.pbsserver;Job Run
10/25/2004 11:53:16;0080; pbs_sched;Svr;main;brk point 135532544
10/25/2004 11:54:20;0080; pbs_sched;Svr;main;brk point 135589888
10/25/2004 12:26:03;0002; pbs_sched;Svr;die;caught signal 15
10/25/2004 12:26:03;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 12:26:03;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 12:26:03;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched 
startup pid 28812
10/25/2004 12:26:18;0080; pbs_sched;Svr;main;brk point 135561216
10/25/2004 13:11:28;0040; pbs_sched;Job;1367.pbsserver;Job Run
10/25/2004 13:12:00;0080; pbs_sched;Svr;main;brk point 135569408
10/25/2004 13:27:51;0002; pbs_sched;Svr;toolong;alarm call
10/25/2004 13:27:51;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 13:27:51;0002; pbs_sched;Svr;toolong;restart dir / object 
/usr/local/sbin/pbs_sched
10/25/2004 13:27:51;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 13:27:51;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched 
startup pid 2350
10/25/2004 13:31:21;0002; pbs_sched;Svr;toolong;alarm call
10/25/2004 13:31:21;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 13:31:21;0002; pbs_sched;Svr;toolong;restart dir / object 
/usr/local/sbin/pbs_sched
10/25/2004 13:31:21;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 13:31:21;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched 
startup pid 2845


-- 

Leandro Tavares Carneiro
Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
Tel: (0xx21) 2534-1427


More information about the torqueusers mailing list