[torquedev] TORQUE 2.2.0 Defaults
craigm at dcs.gla.ac.uk
Thu Aug 16 17:52:19 MDT 2007
Garrick et al,
>How does pbs_mom know the process is gone? It can't check the pids because
>they might be reused by new processes after the boot.
"Current system time" - "job walltime" vs "start-time of PID"
All(?) Unix machines seem to provide the start time of a process, so
if the process time of a job is unexpectedly low, then the node must
have rebooted and the original job is dead.
My only question is
(a) is this figure reset when exec() is called
(b) if the system clock changes unexpectedly over the lifetime of the job
followed by the pbs_mom being restarted with recovery enabled. Not sure if
a Daylight saving time change would be a problem in this scenario.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torquedev