[torqueusers] epilogue script runs twice

Kevin Van Workum vanw at sabalcore.com
Wed Dec 23 09:21:35 MST 2009


I've ended up using a per-job semaphore (symlink) within the epilogue to
determine if the epilogue script has already run for a particular job. This
way I can prevent doing certain things a second time if the epilogue gets
called more than once.

However, it would be a nice feature if torque could give some notice to the
epilogue script that it has already been run for a particular job and for
what reason, i.e. sigterm or sigkill.

Kevin

On Tue, Dec 22, 2009 at 12:36 PM, Al Taufer
<ataufer at adaptivecomputing.com>wrote:

> You might want try using qmgr to increase the value of the "kill_delay"
> parameter.  It's default value is 2 seconds.  kill_delay specifies the time
> the server will wait before sending the sigkill request to the mom.
>  Increase it's value high enough so the jobs being qdel'ed have enough time
> to exit.  This should eliminate the duplicate epilogue runs unless you
> encounter a job that does not respond to the sigterm.
>
> Al Taufer
> Adaptive Computing
>
> ----- "Kevin Van Workum" <vanw at sabalcore.com> wrote:
>
> > On Sun, Dec 20, 2009 at 12:40 PM, Kevin Van Workum <
> > vanw at sabalcore.com > wrote:
> >
> >
> >
> > Sometimes, my epilogue script runs twice. This happens if a user
> > qdel's the job, but the job takes a while to exit, so a sigkill is
> > sent. The epilogue runs again when the sigkill is sent. However, after
> > some testing, this doesn't happen consistently. About 1 in 10 times.
> > Is this the expected behavior? How can I force torque to run the
> > epilogue script only once? Or maybe I can check from within my
> > epilogue that it has already run for this job? This is causing issues
> > with our internal accounting system.
> >
> >
> > It doesn't seem my message got posted, so I'm trying again.
> >
> > -Kevin
> >
> > --
> > Kevin Van Workum, PhD
> > Sabalcore Computing Inc.
> > Run your code on 500 processors.
> > Sign up for a free trial account.
> > www.sabalcore.com
> > 877-492-8027 ext. 11
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
Kevin Van Workum, PhD
Sabalcore Computing Inc.
Run your code on 500 processors.
Sign up for a free trial account.
www.sabalcore.com
877-492-8027 ext. 11
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20091223/a8c34482/attachment.html 


More information about the torqueusers mailing list