[torquedev] [Bug 16] New pbs_sched.c main loop

Simon Toth simont at mail.muni.cz
Tue Aug 4 09:36:26 MDT 2009


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=16

I'm currently working on a scheduler based upon the FIFO. It's final
purpose is to run on a M:N architecture (many servers connected to many
schedulers).

While implementing I run into a problem with the pbs_sched.c main loop.
Currently the scheduler is processing commands one by one. The problem
is that the server is capable of generating a lot of commands in a very
short time period, while the scheduler takes considerable time running
one scheduling cycle.

The result is that while the scheduler has done processing all changes
on the server in the first two loops, there can be 10 or even 100 more
commands waiting.

The change I implemented is that instead of running the scheduling cycle
for each command, all connections are accepted and all commands are
fetched beforehand.

After this is done, the commands are processed (either the old way, by
running for each distinct [duplicates are ignored] command the
scheduling process, or passing the commands as a set). This improves
response times significantly.

The original implementation had a very bad habit of starving servers
(when more then one servers was connected to one scheduler).

Main changes are in pbs_sched.c

Support for new scheduler invocation for FIFO scheduler included in the
patch.

-- 
Mgr. Simon Toth
CESNET z.s.p.o.
Zikova 4
160 00 Praha 6
Czech Republic
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pbs_sched-main-loop.patch
Type: text/x-patch
Size: 17890 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20090804/bb5463cf/attachment.bin 


More information about the torquedev mailing list