[torqueusers] Scheduler keeps restarting!
Leandro Tavares Carneiro
leandro at ep.petrobras.com.br
Mon Oct 25 10:49:38 MDT 2004
Hi,
I have about 500 dual Xeon nodes and i was running OpenPBS, when i got
several problems with large nodes, using about 400 CPUs then i changed
to torque, now 1.1.0p3, and everything is going nice now, but, the
scheduler is restarting a lot when jobs enter in queue. When this
happens, sometimes, i have to manually do a 'qrun' for the jobs run.
When i migrate from OpenPBS i dont touched on anything on my
configuration, and it is working, but maybe i have missed something. The
extract of the log of the sched is: Note, i'm using fifo.
10/25/2004 11:49:55;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched
startup pid 24693
10/25/2004 11:53:13;0002; pbs_sched;Svr;toolong;alarm call
10/25/2004 11:53:13;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 11:53:13;0002; pbs_sched;Svr;toolong;restart dir / object
/usr/local/sbin/pbs_sched
10/25/2004 11:53:13;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 11:53:13;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched
startup pid 25198
10/25/2004 11:53:16;0040; pbs_sched;Job;1366.pbsserver;Job Run
10/25/2004 11:53:16;0080; pbs_sched;Svr;main;brk point 135532544
10/25/2004 11:54:20;0080; pbs_sched;Svr;main;brk point 135589888
10/25/2004 12:26:03;0002; pbs_sched;Svr;die;caught signal 15
10/25/2004 12:26:03;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 12:26:03;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 12:26:03;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched
startup pid 28812
10/25/2004 12:26:18;0080; pbs_sched;Svr;main;brk point 135561216
10/25/2004 13:11:28;0040; pbs_sched;Job;1367.pbsserver;Job Run
10/25/2004 13:12:00;0080; pbs_sched;Svr;main;brk point 135569408
10/25/2004 13:27:51;0002; pbs_sched;Svr;toolong;alarm call
10/25/2004 13:27:51;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 13:27:51;0002; pbs_sched;Svr;toolong;restart dir / object
/usr/local/sbin/pbs_sched
10/25/2004 13:27:51;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 13:27:51;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched
startup pid 2350
10/25/2004 13:31:21;0002; pbs_sched;Svr;toolong;alarm call
10/25/2004 13:31:21;0002; pbs_sched;Svr;Log;Log closed
10/25/2004 13:31:21;0002; pbs_sched;Svr;toolong;restart dir / object
/usr/local/sbin/pbs_sched
10/25/2004 13:31:21;0002; pbs_sched;Svr;Log;Log opened
10/25/2004 13:31:21;0002; pbs_sched;Svr;main;/usr/local/sbin/pbs_sched
startup pid 2845
--
Leandro Tavares Carneiro
Petrobras TI/TI-E&P/STEP Suporte Tecnico de E&P
Av Chile, 65 sala 1501 EDISE - Rio de Janeiro / RJ
Tel: (0xx21) 2534-1427
More information about the torqueusers
mailing list