[torqueusers] Scheduler efficiency

Franc Carter franc.carter at gmail.com
Thu Jun 8 20:22:34 MDT 2006


Hi,

We are using torque-1.2 with a site specific TCL scheduling algorithm. The
number
of jobs in the queue has grown significantly since we implemented (several
thousand)
and the scheduler takes a long time to make a decision and uses lots of CPU
time.

Part of the problem appears to be that on every cycle the scheduler needs to
completely reread the entire state instead of being able to find out just
the
change that caused the scheduler to be invoked - i.e job 1234 exited.

I had a look through the source code and it looks like this information is
not available in the protocol - but my C is rather rusty.

Can someone confirm that this information is not available to the scheduler,
and is
it available in the 2.0 version. More importantly is anyone running a
scheduler that
works 'efficiently' in the 1000's of jobs range.

thanks

-- 
Franc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060609/6d129ef5/attachment.html


More information about the torqueusers mailing list