[torqueusers] Scheduler efficiency

Franc Carter franc.carter at gmail.com
Mon Jun 12 19:44:12 MDT 2006


Oh well,

thanks

On 6/13/06, garrick at speculation.org <garrick at speculation.org> wrote:
>
> On Fri, Jun 09, 2006 at 12:22:34PM +1000, Franc Carter alleged:
> > Hi,
> >
> > We are using torque-1.2 with a site specific TCL scheduling algorithm.
> The
> > number
> > of jobs in the queue has grown significantly since we implemented
> (several
> > thousand)
> > and the scheduler takes a long time to make a decision and uses lots of
> CPU
> > time.
> >
> > Part of the problem appears to be that on every cycle the scheduler
> needs to
> > completely reread the entire state instead of being able to find out
> just
> > the
> > change that caused the scheduler to be invoked - i.e job 1234 exited.
> >
> > I had a look through the source code and it looks like this information
> is
> > not available in the protocol - but my C is rather rusty.
> >
> > Can someone confirm that this information is not available to the
> scheduler,
> > and is
> > it available in the 2.0 version. More importantly is anyone running a
> > scheduler that
> > works 'efficiently' in the 1000's of jobs range.
>
> Unfortunately, that is just how it works.  Each scheduling iteration
> must call pbs_statjob() and "download" all job info.
>
> I've been thinking that it would be nice to have a second version of the
> pbs_stat*() functions that save their own state inside of pbs_server and
> only return changes (as long as the connection is maintained.)
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Franc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060613/463f246a/attachment.html


More information about the torqueusers mailing list