[torqueusers] stability observations
saydakov at yahoo-inc.com
Thu Apr 6 16:34:14 MDT 2006
1. You mean some memory leak detector? I have not tried that.
2. Too bad. Maybe I need to set up Maui. Especially considering #1.
3. Below is the latest crash few hours after I deleted some nodes. I think
there was no activity right after I deleted them, but it crashed as soon as
new jobs started to pile up.
Core was generated by `pbs_server'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libkvm.so.2...done.
Reading symbols from /usr/lib/libc.so.4...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0 0x1005df9 in bad_node_warning (addr=1122282515) at node_func.c:226
226 if (pbsndlist[i]->nd_addrs == addr)
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Garrick Staples
Sent: Thursday, April 06, 2006 3:03 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] stability observations
On Thu, Apr 06, 2006 at 12:57:47PM -0700, Alexander Saydakov alleged:
> After a few months of running 2.0.0p7 and 2.0.0p8 on FreeBSD 4.10 I
> the following:
> 1. pbs_sched has a memory leak. Its footprint keeps growing every day,
> so after a fresh start it reaches 300M in a few days
Can you capture this in valgrind? (or whatever freebsd has)
> 2. pbs_sched has some bug in the algorithm. Quite often it picks up
> some random jobs from lower priority queues despite of a lot of jobs in
> higher priority queues.
I don't know how much support you are going to get for this. Noone is
> 3. pbs_server is unstable when some configuration changes are made.
> Strangely, but it can crash after a few minutes since a change. Not all
> changes are bad. Adding nodes and queues, or adjusting their parameters is
> fine. After deleting nodes (with patch! With no patch it died
> for instance, it died within a few hours. If you don't touch it, it runs
Can you capture this in gdb?
Garrick Staples, Linux/HPCC Administrator
University of Southern California
More information about the torqueusers