[torqueusers] SIGTERM and pbsdsh

Jan Ploski Jan.Ploski at offis.de
Fri Nov 30 01:28:54 MST 2007


torqueusers-bounces at supercluster.org schrieb am 11/29/2007 10:43:42 PM:

> On Tue, Nov 27, 2007 at 09:52:36AM -0600, Tim Freeman alleged:
> > I am starting the same executable on N nodes using pbsdsh -n. 
> During a qdel,
> > SIGTERM signals do not look like they are propagating to each 
> process, only a
> > SIGKILL from the initial looks of it (there's a SIGTERM handler in the
> > executable that is not getting invoked).
> > 
> > The application I'm running greatly benefits from getting to run a 
cleanup
> > routine if cancelled.  Is there an option to pbsdsh or some technique 
to use
> > where I can make this happen? 
> 
> There's 2 common things here.  The first is "kill_delay", the queue 
attribute
> that specifies the time between the initial TERM and the later KILL. The
> default is too short.

This reminds me of my old problem, kill_delay absolutely does not work for 
me (not using pbsdsh, just TORQUE 2.1.6 + Maui).

> The second is that your top-level shell is catching the TERM signal and
> exiting.  You need to ignore the TERM in your batch script.

trap '' TERM INT

in the job script should be enough, right? Nevertheless, the job is killed 
in seconds no matter what kill_delay I set, in server or queue 
configuration. The same thing happens both with qdel and qsig...

Best regards,
Jan Ploski


More information about the torqueusers mailing list