[torqueusers] stability observations

Garrick Staples garrick at usc.edu
Thu Apr 6 16:02:33 MDT 2006

On Thu, Apr 06, 2006 at 12:57:47PM -0700, Alexander Saydakov alleged:
> After a few months of running 2.0.0p7 and 2.0.0p8 on FreeBSD 4.10 I observed
> the following:
> 1.	pbs_sched has a memory leak. Its footprint keeps growing every day,
> so after a fresh start it reaches 300M in a few days

Can you capture this in valgrind?  (or whatever freebsd has)

> 2.	pbs_sched has some bug in the algorithm. Quite often it picks up
> some random jobs from lower priority queues despite of a lot of jobs in
> higher priority queues.

I don't know how much support you are going to get for this.  Noone is
maintaining pbs_sched.

> 3.	pbs_server is unstable when some configuration changes are made.
> Strangely, but it can crash after a few minutes since a change. Not all
> changes are bad. Adding nodes and queues, or adjusting their parameters is
> fine. After deleting nodes (with patch! With no patch it died immediately),
> for instance, it died within a few hours. If you don't touch it, it runs
> forever.

Can you capture this in gdb?

Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060406/1fd3695b/attachment.bin

More information about the torqueusers mailing list