[torqueusers] sudden pbs_server & pbs_mom segfaults

Dimitris Zilaskos dzila at tassadar.physics.auth.gr
Thu May 28 03:55:37 MDT 2009

Ken Nielson wrote:
> Dimitris,
> I think it looks like your streams tree has been corrupted. To fix the problem we need to find out why. If fixing the serverdb file in server_priv does not correct the problem then the next step might be to get even more information by setting the log level to 7 on the server and the mom to see if it tells us more. The tdelete function reports information at log level 6.
> Thanks
> Ken Nielson
> Cluster Resources

What I just did was:

a)qmgr -c 'print server' >qmgr.txt
b)stop pbs_server
c)remove serverdb
d)pbs_server -t create
e)qmgr <qmgr.txt
d)stop pbs_server, and relaunch it under gdb.

Is that sufficient? Think I should grep in the logs for something? I am 
pretty sure on some nodes I was setting PBSLOGLEVEL 7 before launching 
it under gdb.

Ah for the record, as I was working on this, the pbs_mom on all non idle 
nodes of one of the clusters crashed!


Dimitris Zilaskos
GridAUTH Operations Centre @ Aristotle University of Thessaloniki , Greece
Tel: +302310998988 Fax: +302310994309

More information about the torqueusers mailing list