[torqueusers] epilogue script runs twice

Al Taufer ataufer at adaptivecomputing.com
Tue Dec 22 10:36:55 MST 2009


You might want try using qmgr to increase the value of the "kill_delay" parameter.  It's default value is 2 seconds.  kill_delay specifies the time the server will wait before sending the sigkill request to the mom.  Increase it's value high enough so the jobs being qdel'ed have enough time to exit.  This should eliminate the duplicate epilogue runs unless you encounter a job that does not respond to the sigterm.

Al Taufer
Adaptive Computing

----- "Kevin Van Workum" <vanw at sabalcore.com> wrote:

> On Sun, Dec 20, 2009 at 12:40 PM, Kevin Van Workum <
> vanw at sabalcore.com > wrote:
> 
> 
> 
> Sometimes, my epilogue script runs twice. This happens if a user
> qdel's the job, but the job takes a while to exit, so a sigkill is
> sent. The epilogue runs again when the sigkill is sent. However, after
> some testing, this doesn't happen consistently. About 1 in 10 times.
> Is this the expected behavior? How can I force torque to run the
> epilogue script only once? Or maybe I can check from within my
> epilogue that it has already run for this job? This is causing issues
> with our internal accounting system.
> 
> 
> It doesn't seem my message got posted, so I'm trying again.
> 
> -Kevin
> 
> --
> Kevin Van Workum, PhD
> Sabalcore Computing Inc.
> Run your code on 500 processors.
> Sign up for a free trial account.
> www.sabalcore.com
> 877-492-8027 ext. 11
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list