<br>Hi,<br><br>We are using torque-1.2 with a site specific TCL scheduling algorithm. The number<br>of jobs in the queue has grown significantly since we implemented (several thousand)<br>and the scheduler takes a long time to make a decision and uses lots of CPU time.
<br><br>Part of the problem appears to be that on every cycle the scheduler needs to<br>completely reread the entire state instead of being able to find out just the<br>change that caused the scheduler to be invoked - i.e
job 1234 exited.<br><br>I had a look through the source code and it looks like this information is<br>not available in the protocol - but my C is rather rusty.<br clear="all"><br>Can someone confirm that this information is not available to the scheduler, and is
<br>it available in the 2.0 version. More importantly is anyone running a scheduler that<br>works 'efficiently' in the 1000's of jobs range.<br><br>thanks<br><br>-- <br>Franc