[torqueusers] Server not talking to MOMs at all
Garrick Staples
garrick at usc.edu
Fri Sep 9 13:49:47 MDT 2005
On Thu, Sep 08, 2005 at 10:52:00AM +0200, Lennart Karlsson alleged:
> The state of Torque today, in our environments, is that it behaves
> badly or crashes (e.g. the current favourite: eats all available internal
> and swap memory) in one way or another very frequently (I appreciate, without
> measuring, that it is a factor of more than 100 times) compared to the
That should really not be happening. My server host has 38 days uptime
and pbs_server is using 15MB of ram. That's with 1700 nodes and 1200+
jobs every day.
Set 'PBSDEBUG=5' in your env and run pbs_server under gdb or valgrind.
Please send any valgrind errors and/or a gdb backtrace of a crash.
> I believe that more "bells and whistles" on Torque might make it less
> available. We need to change Torque, but we have to make the right
> choices and we have to be careful not to put "everything" into it.
My primary goals are stability, flexibility, and reasonable
administration. The "everything" features are for moab :)
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050909/8a5535cd/attachment.bin
More information about the torqueusers
mailing list