[torqueusers] sudden pbs_server & pbs_mom segfaults
dzila at tassadar.physics.auth.gr
Thu May 28 08:17:26 MDT 2009
>> Looks like, after recreating serverdb, that the job counter has been
>> reset, and already running jobs are invisible to the qstat. Can I do
>> something so they show up? I have backup of the old serverdb.
> Hi Dimitris
> That is expected.
> The old database is gone, and with it the job counter.
Well I do not mind much about the job counter, though I see that
/var/spool/pbs/tmpdir has some leftovers that could cause name
collisions in the future. I am in the process of cleaning them up.
> Likewise for any pending and running jobs.
> You may need to kill the leftover processes on the nodes by hand.
I cannot do that cause my users will complain, they already complain
about being unable to see the status of their jobs. I have marked the
nodes with jobs as off-line and I let them know when the running jobs
finish. When they do finish though, I am not sure if they are gonna get
moved from /var/spool/pbs/tmpdir to the home directory of the user now
that serverdb was recreated...anyone can guess?
> I recreated the database a few times here, after the whole Torque+Maui
> was working, just to start the job counter fresh.
> Not sure if the old mom, server, and scheduler and logs
> are preserved, though, but they may.
GridAUTH Operations Centre @ Aristotle University of Thessaloniki , Greece
Tel: +302310998988 Fax: +302310994309
More information about the torqueusers