[torqueusers] question about prologue / epilogue

David Jackson jacksond at clusterresources.com
Fri Apr 22 18:06:45 MDT 2005


Garrick,

  One your system, have you noticed with the new job execution model
that sometimes the sessionid for a multi-node job is lost?  We have a
report of this and want to confirm that other sites are seeing it.

Dave

On Fri, 2005-04-22 at 09:19 -0700, Garrick Staples wrote:
> On Thu, Apr 21, 2005 at 04:13:33PM -0400, Glen Beane alleged:
> > Occasionally a node issue can result in a job bouncing between the Q 
> > and R state  (torque tries to start the job, fails, waits, and tries 
> > again).  This goes on and on until we intervene.
> 
> Next time this happens, bump up the loglevel on the MS pbs_mom process with
> SIGUSR1 to level 7 or 8, and send us a bit of the log that shows the loop.
> 
>  
> > Will the prologue and epilogue get run every time?  I suspect the 
> > epilogue will only run when the job goes into the E state.
> 
> There are many steps to setting up a job, each of which is a possible failure
> point.  If it fails at an early point, then prologue isn't run.  Epilogue is
> always run if prologue was run.
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list