[torqueusers] stability observations

Alexander Saydakov saydakov at yahoo-inc.com
Thu Apr 6 13:57:47 MDT 2006


After a few months of running 2.0.0p7 and 2.0.0p8 on FreeBSD 4.10 I observed
the following:

 

1.	pbs_sched has a memory leak. Its footprint keeps growing every day,
so after a fresh start it reaches 300M in a few days
2.	pbs_sched has some bug in the algorithm. Quite often it picks up
some random jobs from lower priority queues despite of a lot of jobs in
higher priority queues.
3.	pbs_server is unstable when some configuration changes are made.
Strangely, but it can crash after a few minutes since a change. Not all
changes are bad. Adding nodes and queues, or adjusting their parameters is
fine. After deleting nodes (with patch! With no patch it died immediately),
for instance, it died within a few hours. If you don't touch it, it runs
forever.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060406/8f19ce42/attachment.html


More information about the torqueusers mailing list