[torqueusers] sudden pbs_server & pbs_mom segfaults

Gus Correa gus at ldeo.columbia.edu
Thu May 28 08:09:12 MDT 2009


Dimitris Zilaskos wrote:
> Ken Nielson wrote:
>> Dimitris,
>>
>> I think it looks like your streams tree has been corrupted. To fix the problem we need to find out why. If fixing the serverdb file in server_priv does not correct the problem then the next step might be to get even more information by setting the log level to 7 on the server and the mom to see if it tells us more. The tdelete function reports information at log level 6.
>>
>> Thanks
>>
> 
> Looks like, after recreating serverdb, that the job counter has been 
> reset, and already running jobs are invisible to the qstat. Can I do 
> something so they show up? I have backup of the old serverdb.
> 
Hi Dimitris

That is expected.
The old database is gone, and with it the job counter.
Likewise for any pending and running jobs.
You may need to kill the leftover processes on the nodes by hand.
I recreated the database a few times here, after the whole Torque+Maui
was working, just to start the job counter fresh.

Not sure if the old mom, server, and scheduler and logs
are preserved, though, but they may.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


> Cheers,
> 
> 
> 



More information about the torqueusers mailing list