[torqueusers] Scheduler efficiency
franc.carter at gmail.com
Mon Jun 12 19:44:12 MDT 2006
On 6/13/06, garrick at speculation.org <garrick at speculation.org> wrote:
> On Fri, Jun 09, 2006 at 12:22:34PM +1000, Franc Carter alleged:
> > Hi,
> > We are using torque-1.2 with a site specific TCL scheduling algorithm.
> > number
> > of jobs in the queue has grown significantly since we implemented
> > thousand)
> > and the scheduler takes a long time to make a decision and uses lots of
> > time.
> > Part of the problem appears to be that on every cycle the scheduler
> needs to
> > completely reread the entire state instead of being able to find out
> > the
> > change that caused the scheduler to be invoked - i.e job 1234 exited.
> > I had a look through the source code and it looks like this information
> > not available in the protocol - but my C is rather rusty.
> > Can someone confirm that this information is not available to the
> > and is
> > it available in the 2.0 version. More importantly is anyone running a
> > scheduler that
> > works 'efficiently' in the 1000's of jobs range.
> Unfortunately, that is just how it works. Each scheduling iteration
> must call pbs_statjob() and "download" all job info.
> I've been thinking that it would be nice to have a second version of the
> pbs_stat*() functions that save their own state inside of pbs_server and
> only return changes (as long as the connection is maintained.)
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers