[torqueusers] question about prologue / epilogue

David Jackson jacksond at clusterresources.com
Fri Apr 22 18:06:45 MDT 2005


  One your system, have you noticed with the new job execution model
that sometimes the sessionid for a multi-node job is lost?  We have a
report of this and want to confirm that other sites are seeing it.


On Fri, 2005-04-22 at 09:19 -0700, Garrick Staples wrote:
> On Thu, Apr 21, 2005 at 04:13:33PM -0400, Glen Beane alleged:
> > Occasionally a node issue can result in a job bouncing between the Q 
> > and R state  (torque tries to start the job, fails, waits, and tries 
> > again).  This goes on and on until we intervene.
> Next time this happens, bump up the loglevel on the MS pbs_mom process with
> SIGUSR1 to level 7 or 8, and send us a bit of the log that shows the loop.
> > Will the prologue and epilogue get run every time?  I suspect the 
> > epilogue will only run when the job goes into the E state.
> There are many steps to setting up a job, each of which is a possible failure
> point.  If it fails at an early point, then prologue isn't run.  Epilogue is
> always run if prologue was run.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers

More information about the torqueusers mailing list