[torqueusers] sending SIGINT to all nodes in a multi-node job

Garrick Staples garrick at usc.edu
Wed Jul 18 21:21:19 MDT 2007


On Wed, Jul 18, 2007 at 07:50:37PM -0700, Peter Wyckoff alleged:
> 
> We want all the nodes in a multi-node computation to let the parent
> process for a job get a SIGINT (at least) kill_delay seconds before the
> SIGKILL. Right now it seems that only the MS (aka head) node gets this. 
> 
> I don't know how we can then send a signal to all the sister nodes? With
> Moab, maybe pbsdsh mjobctl -s 15 <jobid> from the MS. But, without moab,
> there's no way to signal a to the parent processes on the other nodes,
> is there?
> 
> This thread in 05 said this was expected behavior. I just don't know how
> given this I can cleanly shut down the sister moms ???
> 
> Thanks, pete
> 
> 
> http://www.clusterresources.com/pipermail/torqueusers/2005-September/002156.html

Everything in that email was hypothetical, except for where I said,
"That is the expected behaviour currently.  Only MS signals processes."

For your case, just have the batch script trap SIGINT and then do
whatever you want.  A 'pbsdsh kill $pid' wouldn't work out because you
won't have a single pid number.

But mpiexec could certainly forward the SIGINT to its TM tasks.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20070718/a761182c/attachment.bin


More information about the torqueusers mailing list