[torqueusers] Server not talking to MOMs at all

Garrick Staples garrick at usc.edu
Fri Sep 9 13:49:47 MDT 2005


On Thu, Sep 08, 2005 at 10:52:00AM +0200, Lennart Karlsson alleged:
> The state of Torque today, in our environments, is that it behaves
> badly or crashes (e.g. the current favourite: eats all available internal
> and swap memory) in one way or another very frequently (I appreciate, without
> measuring, that it is a factor of more than 100 times) compared to the

That should really not be happening.  My server host has 38 days uptime
and pbs_server is using 15MB of ram.  That's with 1700 nodes and 1200+
jobs every day.

Set 'PBSDEBUG=5' in your env and run pbs_server under gdb or valgrind.
Please send any valgrind errors and/or a gdb backtrace of a crash.


> I believe that more "bells and whistles" on Torque might make it less
> available. We need to change Torque, but we have to make the right
> choices and we have to be careful not to put "everything" into it.

My primary goals are stability, flexibility, and reasonable
administration.  The "everything" features are for moab :)

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050909/8a5535cd/attachment.bin


More information about the torqueusers mailing list