[torqueusers] question about prologue / epilogue
jacksond at clusterresources.com
Fri Apr 22 18:06:45 MDT 2005
One your system, have you noticed with the new job execution model
that sometimes the sessionid for a multi-node job is lost? We have a
report of this and want to confirm that other sites are seeing it.
On Fri, 2005-04-22 at 09:19 -0700, Garrick Staples wrote:
> On Thu, Apr 21, 2005 at 04:13:33PM -0400, Glen Beane alleged:
> > Occasionally a node issue can result in a job bouncing between the Q
> > and R state (torque tries to start the job, fails, waits, and tries
> > again). This goes on and on until we intervene.
> Next time this happens, bump up the loglevel on the MS pbs_mom process with
> SIGUSR1 to level 7 or 8, and send us a bit of the log that shows the loop.
> > Will the prologue and epilogue get run every time? I suspect the
> > epilogue will only run when the job goes into the E state.
> There are many steps to setting up a job, each of which is a possible failure
> point. If it fails at an early point, then prologue isn't run. Epilogue is
> always run if prologue was run.
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers