[torqueusers] sudden pbs_server & pbs_mom segfaults
dzila at tassadar.physics.auth.gr
Thu May 28 03:55:37 MDT 2009
Ken Nielson wrote:
> I think it looks like your streams tree has been corrupted. To fix the problem we need to find out why. If fixing the serverdb file in server_priv does not correct the problem then the next step might be to get even more information by setting the log level to 7 on the server and the mom to see if it tells us more. The tdelete function reports information at log level 6.
> Ken Nielson
> Cluster Resources
What I just did was:
a)qmgr -c 'print server' >qmgr.txt
d)pbs_server -t create
d)stop pbs_server, and relaunch it under gdb.
Is that sufficient? Think I should grep in the logs for something? I am
pretty sure on some nodes I was setting PBSLOGLEVEL 7 before launching
it under gdb.
Ah for the record, as I was working on this, the pbs_mom on all non idle
nodes of one of the clusters crashed!
GridAUTH Operations Centre @ Aristotle University of Thessaloniki , Greece
Tel: +302310998988 Fax: +302310994309
More information about the torqueusers