[torquedev] pbsnodes -a shows long gone jobs
Bogdan.Costescu at iwr.uni-heidelberg.de
Wed Jun 17 07:49:53 MDT 2009
On Wed, 17 Jun 2009, Glen Beane wrote:
> You will notice the state is free, but this lists 5 jobs in the status
> string. These jobs are long gone from the system, in the case of job
> number 37005 the job has been completed for OVER 2 MONTHS!
I have seen many times such jobs being kept in the list without
actually running, for cases where the job has died for some reason -
f.e. when nodes crash; on a ~7 years old cluster, the reliability is
quite poor so such events happen often. I haven't seen any bad effects
from this, except maybe some messages from nodes to the master asking
for the job to be killed (which the server ignores as the jobs are no
longer in its database). This is with Torque 2.1.10.
The nodes are rebuilt upon reboot so whatever state Torque keeps on
the local disk is lost, therefore I can't say whether a reboot does
something to these phantom jobs...
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the torquedev