[torqueusers] sudden pbs_server & pbs_mom segfaults

Dimitris Zilaskos dzila at tassadar.physics.auth.gr
Thu May 28 03:55:37 MDT 2009


Ken Nielson wrote:
> Dimitris,
> 
> I think it looks like your streams tree has been corrupted. To fix the problem we need to find out why. If fixing the serverdb file in server_priv does not correct the problem then the next step might be to get even more information by setting the log level to 7 on the server and the mom to see if it tells us more. The tdelete function reports information at log level 6.
> 
> Thanks
> 
> Ken Nielson
> Cluster Resources
>

What I just did was:

a)qmgr -c 'print server' >qmgr.txt
b)stop pbs_server
c)remove serverdb
d)pbs_server -t create
e)qmgr <qmgr.txt
d)stop pbs_server, and relaunch it under gdb.

Is that sufficient? Think I should grep in the logs for something? I am 
pretty sure on some nodes I was setting PBSLOGLEVEL 7 before launching 
it under gdb.

Ah for the record, as I was working on this, the pbs_mom on all non idle 
nodes of one of the clusters crashed!

Cheers,

-- 
=============================================================================
Dimitris Zilaskos
GridAUTH Operations Centre @ Aristotle University of Thessaloniki , Greece
Tel: +302310998988 Fax: +302310994309
http://www.grid.auth.gr
=============================================================================


More information about the torqueusers mailing list