[torqueusers] sending SIGINT to all nodes in a multi-node job

Peter Wyckoff wyckoff at yahoo-inc.com
Wed Jul 18 20:50:37 MDT 2007


We want all the nodes in a multi-node computation to let the parent
process for a job get a SIGINT (at least) kill_delay seconds before the
SIGKILL. Right now it seems that only the MS (aka head) node gets this. 

I don't know how we can then send a signal to all the sister nodes? With
Moab, maybe pbsdsh mjobctl -s 15 <jobid> from the MS. But, without moab,
there's no way to signal a to the parent processes on the other nodes,
is there?

This thread in 05 said this was expected behavior. I just don't know how
given this I can cleanly shut down the sister moms ???

Thanks, pete


http://www.clusterresources.com/pipermail/torqueusers/2005-September/002
156.html


More information about the torqueusers mailing list