[torquedev] TORQUE 2.2.0 Defaults
garrick at usc.edu
Thu Aug 16 17:58:41 MDT 2007
On Fri, Aug 17, 2007 at 12:52:19AM +0100, Craig Macdonald alleged:
> Garrick et al,
> >How does pbs_mom know the process is gone? It can't check the pids because
> >they might be reused by new processes after the boot.
> "Current system time" - "job walltime" vs "start-time of PID"
> All(?) Unix machines seem to provide the start time of a process, so
> if the process time of a job is unexpectedly low, then the node must
> have rebooted and the original job is dead.
> My only question is
> (a) is this figure reset when exec() is called
> (b) if the system clock changes unexpectedly over the lifetime of the job
> followed by the pbs_mom being restarted with recovery enabled. Not sure if
> a Daylight saving time change would be a problem in this scenario.
That's possible, but would have to be implemented individually for each mom
arch (look at the different directories in src/resmom).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20070816/e8d63f51/attachment-0001.bin
More information about the torquedev