[torqueusers] sending SIGINT to all nodes in a multi-node job
garrick at usc.edu
Wed Jul 18 21:21:19 MDT 2007
On Wed, Jul 18, 2007 at 07:50:37PM -0700, Peter Wyckoff alleged:
> We want all the nodes in a multi-node computation to let the parent
> process for a job get a SIGINT (at least) kill_delay seconds before the
> SIGKILL. Right now it seems that only the MS (aka head) node gets this.
> I don't know how we can then send a signal to all the sister nodes? With
> Moab, maybe pbsdsh mjobctl -s 15 <jobid> from the MS. But, without moab,
> there's no way to signal a to the parent processes on the other nodes,
> is there?
> This thread in 05 said this was expected behavior. I just don't know how
> given this I can cleanly shut down the sister moms ???
> Thanks, pete
Everything in that email was hypothetical, except for where I said,
"That is the expected behaviour currently. Only MS signals processes."
For your case, just have the batch script trap SIGINT and then do
whatever you want. A 'pbsdsh kill $pid' wouldn't work out because you
won't have a single pid number.
But mpiexec could certainly forward the SIGINT to its TM tasks.
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070718/a761182c/attachment.bin
More information about the torqueusers