[torqueusers] sending SIGINT to all nodes in a multi-node job
wyckoff at yahoo-inc.com
Wed Jul 18 20:50:37 MDT 2007
We want all the nodes in a multi-node computation to let the parent
process for a job get a SIGINT (at least) kill_delay seconds before the
SIGKILL. Right now it seems that only the MS (aka head) node gets this.
I don't know how we can then send a signal to all the sister nodes? With
Moab, maybe pbsdsh mjobctl -s 15 <jobid> from the MS. But, without moab,
there's no way to signal a to the parent processes on the other nodes,
This thread in 05 said this was expected behavior. I just don't know how
given this I can cleanly shut down the sister moms ???
More information about the torqueusers