[torqueusers] killing over limit jobs is unfriendly to mpiexec

Åke Sandgren ake.sandgren at hpc2n.umu.se
Fri Nov 24 00:19:39 MST 2006


On Thu, 2006-11-23 at 15:48 -0500, Pete Wyckoff wrote:
> Mpiexec catches this second SIGTERM and just exits, abandoning any
> tasks.  The thought was that when users hit ctrl-c, it tries to
> clean tasks up nicely, but if the batch system has hosed itself, a
> second tap of ctrl-c will force mpiexec to exit.  If I were to
> ignore future SIGTERMs, users would have to hit ctrl-z, then "kill
> -9" the process to get it to go away.

BTW, ctrl-c send SIGINT not SIGTERM...

The only way out of this that i can see so far is if TM-based processes
that needs to do things like mpiexec could be registred in mom in such a
way that as long as there are such processes left it refrains from
running scan_for_terminated when it sees termin_child.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



More information about the torqueusers mailing list