[torqueusers] SIGTERM and pbsdsh
Jan.Ploski at offis.de
Fri Nov 30 01:28:54 MST 2007
torqueusers-bounces at supercluster.org schrieb am 11/29/2007 10:43:42 PM:
> On Tue, Nov 27, 2007 at 09:52:36AM -0600, Tim Freeman alleged:
> > I am starting the same executable on N nodes using pbsdsh -n.
> During a qdel,
> > SIGTERM signals do not look like they are propagating to each
> process, only a
> > SIGKILL from the initial looks of it (there's a SIGTERM handler in the
> > executable that is not getting invoked).
> > The application I'm running greatly benefits from getting to run a
> > routine if cancelled. Is there an option to pbsdsh or some technique
> > where I can make this happen?
> There's 2 common things here. The first is "kill_delay", the queue
> that specifies the time between the initial TERM and the later KILL. The
> default is too short.
This reminds me of my old problem, kill_delay absolutely does not work for
me (not using pbsdsh, just TORQUE 2.1.6 + Maui).
> The second is that your top-level shell is catching the TERM signal and
> exiting. You need to ignore the TERM in your batch script.
trap '' TERM INT
in the job script should be enough, right? Nevertheless, the job is killed
in seconds no matter what kill_delay I set, in server or queue
configuration. The same thing happens both with qdel and qsig...
More information about the torqueusers